Open GoogleCodeExporter opened 8 years ago
I've checked a few possible parser candidates:
JabRef - GPL
javabib - GPL
j4bib - BSD license, no recent activity,
http://sourceforge.net/projects/j4bib/files/
bibparse - no stated license (even in source zip), author's home page hasn't
been updated since 2005 after fairly regular updates before that, so he may be
retired or deceased http://ftp.math.utah.edu/pub//bibparse/
I'll take a look at j4bib unless someone comes up with a better alternative.
It's not a very complex format, so writing from scratch is an option as well.
Original comment by tfmorris
on 11 Nov 2010 at 8:08
Writing from scratch wouldn't be too hard at all, it's just a format of
key:value pairs. ..especially as an importer probably wouldn't need to do
validation.
Original comment by mcnamara.tim@gmail.com
on 15 Nov 2010 at 2:10
I frequently use BibTex so I give this +1!
Original comment by wfz%nimb...@gtempaccount.com
on 15 Nov 2010 at 9:17
attached single BibTex record from Google Books export
[[http://books.google.com/books?id=d1tIAAAAYAAJ&pg=PR3#v=onepage&q&f=false]]
for quality checking with diacritic characters when this feature is implemented.
Original comment by thadguidry
on 19 Nov 2010 at 11:08
Attachments:
I attached a more complicated record from Web of Science (first article for the
query "google"). Note especially the multiple values in some fields.
Google refine would be great for address cleaning and such things... Does it
have a "address guesser"?
Original comment by jan.schu...@gmail.com
on 28 Sep 2011 at 11:26
Attachments:
Some additional possibilities for starting points:
bibtext2rdf Apache 2.0 license, JavaCC grammar
http://sourceforge.net/projects/bibtex2rdf/
ANTLR grammar for BibTex - no stated license
http://stackoverflow.com/questions/7583982/bibtex-grammar-for-antlr
MIT SIMILE bibtext-converter - MIT License, JavaCC grammar - doesn't attempt to
interpret LaTex
http://code.google.com/p/simile-widgets/source/browse/babel/trunk/converters/bibtex-converter
https://simile.mit.edu/repository/babel/trunk/converters/bibtex-converter/
j4bib (mentioned above) - BSD license, uses JLex and CUP
https://downloads.sourceforge.net/project/j4bib/j4bib/j4bib-0.2/j4bib-src-0.2.tar.gz
I take back what I said last year about the format being simple. On the
surface it is, but because one can embedded arbitrary LaTex code, you'd need a
full parser/render to faithfully parse everything. Even for a basic level of
support, you'd need to handle things like LaTex character composition e.g.
{\'E}mile
Original comment by tfmorris
on 15 Oct 2011 at 5:31
If the latex thingy is a problem, maybe a RIS importer can be used, which does
not allow latex commands.
Almost all bug databases can export RIS or bibtex and there are some bibtex to
RIS converter, which should help if you are stuck with bibtex exports.
Original comment by jan.schu...@gmail.com
on 13 Dec 2011 at 6:54
Thanks for the suggestion. The entity substitution issue that I mentioned as
an example of LaTex processing is actually pretty simple, so we'd probably do
that first and see how if it covers the bulk of what people need.
RIS or EndNote XML would be other bibliographic data formats to consider
supporting for import, but I'm not sure they'd replace BibTex since many of the
BibTex files are old hand-maintained bibliographies, not necessarily exports
from a bib. web site or program.
Original comment by tfmorris
on 13 Dec 2011 at 8:29
The interesting things for biblimetricians are probably the name and address
cleaning part. Maybe even name disambiguating: is "Chen, C" of the first work
in the list the same "Chen, C" as in the 1245th work? Or "Meyer-Lüdenscheid,
CW" the same as "Meyer Luedenscheid, C". Unfortunately, in the end, this is
manual work, so I'm not sure how refine can help here. A string comparer which
clusters names based on their string-distance function would be nice and also a
cluster-algo based on the keywords/words in title/words in addresses (there are
quite a few papers on Author name unambiguity, which use such methods) or the
results of a google query (if there are similar authors and a google-query
based on both titles returns some results, it is probably because of the
authors webpage, which lists both works).
The name disambiguating part is probably interesting for others as well:
merging two address databases, ...
Original comment by jan.schu...@gmail.com
on 13 Dec 2011 at 9:51
We're getting off-topic (at least for this issue), so we should probably move
the discussion to the mailing list/Google Groups, but Refine excels (so to
speak) at precisely the kind of thing you're talking about -- allowing for and
amplifying human judgments.
Facets based on author name clusters, edit distances, keywords, and a number of
other things are possible. Various types of name cleanups is one of the
current major uses of Refine.
As I said, if you want to discuss bibliographic data use cases more, let's move
it to the list/group.
Original comment by tfmorris
on 13 Dec 2011 at 10:36
Original comment by tfmorris
on 14 Dec 2011 at 4:07
Original issue reported on code.google.com by
thadguidry
on 11 Nov 2010 at 7:38