inukshuk / bibtex-ruby

A BibTeX library, parser, and converter for Ruby.
http://inukshuk.github.com/bibtex-ruby
GNU General Public License v3.0
156 stars 31 forks source link

extend Bibtex::Names.parse to handle semi-colon lists? #120

Closed mjy closed 8 years ago

mjy commented 8 years ago

Would be super awesome if the library could handle data like below!

When I have a list of authors like:

"Zwolfer, H.; Rinehart, J.A." or "Zikic, V.; Stankovic, S.A.; Hric, B.;" or "Zhu, J.Y.; Wu, G.X.; Ye, G.Y.; Hu, C."

and I do

b = BibTeX::Entry.new(author: "Zhu, J.Y.; Wu, G.X.; Ye, G.Y.; Hu, C.")

then when I do

b.parse_names

I get

E, [2016-09-05 08:54:56#38928] ERROR -- : Failed to parse BibTeX Name on value "," (COMMA) [["Zhu, J.Y.; Wu, G.X.; Ye"]]

Should I be using some other key (e.g. 'authors:')?

Is there a test suite for author lists like this?

inukshuk commented 8 years ago

You can configure the name parser lexer to interpret semicolons like commas. Just adjust the regular expression in BibTeX::NameParser.patterns[:comma].

mjy commented 8 years ago

Thanks much for the speedy reply. My first attempt might be missing something, like so? Is the comma token used for more than one purpose?

2.3.1 :015 > b = BibTeX::Entry.new(author: "Zhu, J.Y.; Wu, G.X.; Ye, G.Y.; Hu, C.") => #<BibTeX::Entry author = Zhu, J.Y.; Wu, G.X.; Ye, G.Y.; Hu, C.>

2.3.1 :016 > BibTeX::NameParser.patterns[:comma] = Regexp.new(/;/) => /;/

> 2.3.1 :017 > b.parse_names

W, [2016-09-05 12:21:56#40659] WARN -- : NameParser: invalid character at position 4; brace level 0. W, [2016-09-05 12:21:56#40659] WARN -- : NameParser: invalid character at position 14; brace level 0. W, [2016-09-05 12:21:56#40659] WARN -- : NameParser: invalid character at position 24; brace level 0. W, [2016-09-05 12:21:56#40659] WARN -- : NameParser: invalid character at position 34; brace level 0. E, [2016-09-05 12:21:56#40659] ERROR -- : Failed to parse BibTeX Name on value "," (ERROR) ["(^start)"]

=> #<BibTeX::Entry author = >

inukshuk commented 8 years ago

Since comma's are also used to separate family and given names, you need to use something like /[;,]/ -- there is only so much you can do (using semicolons is actually not part of BibTeX), so if you need to support more complex cases you'll have to pre-process the names (e.g., replacing ";" by " and ").

mjy commented 8 years ago

Will go with the pre-processor approach- the ' and ' replacement was all we really needed, we'll do some handling in our wrapper layer. I definitely appreciate all the existing magic.

Thanks again for the quick response.