Closed rpgoldman closed 6 years ago
In the log I see some mentions of dropped fields (which I'll look at later), but otherwise, this imports without issue. Did you not get a note among the imported references? BBT documents errors in a freestanding (newly imported) note rather than throwing an error.
The @String references should import without issue, but not from this freestanding sample of course, as it doesn't include the @string definitions; during import, the references are assumed to be strings if they can't be resolved, so I get a published by the name of KAUFMANN
after import.
Ah! I didn't know to look for this note. Indeed there is one:
Import errors found:
line 496: found "(", expected ")"
line 1954: found "L", expected "{"
line 10365: found "", expected "}"
The material around 496 is as follows:
@book(Warren:78,
author = {Beatrice Warren},
year = {1978},
publisher = {Acta Universitatis Gothoburgen},
title = "Semantic Patterns of Noun-Noun Compound",
series = {Gothenburg Studies in English},
volume = 41
)
@techreport(Woods:78, <--- 496
author = {William A. Woods},
address = {Cambridge, Mass.},
year = {January 1978},
institution = {Bolt Beranek and Newman},
title = {Research in Natural Language Understanding:
Quarterly Technical Progress Report No. 1,1}
)
Any chance BBT doesn't like the "naked" 41
in the preceding entry?
At line 1954, BBT seems to not like a comment:
@Comment Len Schubert
That @Comment
prefix is what emacs puts in when you do comment-region
, and it seems to be both correct bibtex and is accepted by Emacs's parsebib
.
The final error is at the end of the file, right after:
@Book{books/mk/Ginsberg93,
author = GINSBERG,
title = "Essentials of Artificial Intelligence",
publisher = KAUFMANN,
year = "1993",
address = KAUFMANN-ADDRESS,
topic = "AI-intro;AI-text;",
ISBN = "1-55860-221-6"
}
Hope that helps. If you would like, I could probably simply upload the whole file.
Yep, that's helpful. How many entries are we talking about in total?
@njbart, do you know what the expected behavior of braceless @comment
s is? The biblatex manual makes no mention of @comment
, and Tame the BeaST
says
The main use of such an entry type is to comment a large part of the bibliography easily, since anything outside an entry is already a comment, and commenting out one entry may be achieved by just removing its initial
@
.
In what sense could it be used to comment out "a large part"?
Or do you know, @rpgoldman ?
ai.bib.txt Comments
I'm not sure about @comment
, because bibtex is so vague about its syntax. My understanding is that you are allowed to put any old garbage into a bib file, and bibtex is supposed to just skip anything that isn't a recognized entry of the form
@
keyword brace citekey field* _matchingbrace
I put in the @comment
prefixes because Emacs's parsebib ( https://github.com/joostkremers/parsebib ) choked when I expected it to treat something like
Papers by Judea Pearl
as a comment.
I have heard conflicting accounts about whether the hashmark is a comment character. TBH, I don't know how to comment out a big block of bibtex.
My ai.bib file
grep and wc indicate I have approximately 1000 entries in my bib file. I will attach it to this issue (renamed to ai.bib.txt
because github doesn't know about .bib
files).
If parsebib chokes on text outside references unless it's not prefixed by @comment
, it's clearly not parsing bibtex properly. Tame the BeaST states clearly states.
anything outside an entry is already a comment, and commenting out one entry may be achieved by just removing its initial @.
On reading that, I think the "big block of text" probably refers to the possibility of doing
@comment{
@misc{...}
@letter{...}
}
but that still doesn't tell me what the expected behavior of braceless @comment
s is supposed to be.
Hash marks (#
) are just more random text outside the reference so they are ipso facto part of the comment, not a start of a comment.
Anyhow, BBT will merrily parse anything that's outside a reference. The only thing I don't know right now is what to make of the braceless @comment
. Is it mean to be an until-end-of-line comment? Something else?
I have the 41
issue solved, it wasn't the 41
but the tab character behind it that I wasn't handling properly. The other problems are somewhat likely to be fallout from not handling the bare @comment
, so as soon as I know what to do with that, I can move on to test those.
The braceless comment I assumed was "comment till end of line, because that's what emacs's bibtex mode put in for me when I selected a region and did comment-region
. I have replaced my braceless comments with comments that do have braces.
I also removed the stray tab character. After that I still see two import failures:
Import errors found:
line 496: found "(", expected ")"
line 10365: found "", expected "}"
I believe that this means that the tab character was not to blame. Putting quotes around the 41
gets us past that error, leaving only the one on line 10365. I'm uploading a new copy of the fixed bibliography.
Interestingly, while I get only one error, it turns out that a ton of the bib entries are lost. I note the following:
/
characters in them. After the import, I don't see any citekeys in Zotero containing a /
character. I conjecture that these entries have all been lost.
Still I have been losing all the entries in the file from lines 4484 on , with no indication in the note. Looking at the file at that point, I see a line-feed (^L
) character in the file (see attachment). Removing that character did not fix the problem. Removing the % in the comment @comment{%Jerry Hobbs}
got me further, but still substantial numbers of entries are simply lost.I am not certain, but it seems like maybe BBT doesn't like %
characters. Unfortunately, these should be acceptable in comments and are definitely acceptable in URLs.
I removed them all... and now BBT just complains that my file is ill-formed!
I am sure the tab character was what tripped up the parser. I wouldn't worry about that right now, I'm just feeding the original bib file through the parser to eliminate errors one by one. You can hold off until I have those handled.
There's one entry in that bib file that's going to be very hard to deal with:
@TECHREPORT(Thiebaux93,
AUTHOR = {Sylvie Thi\{'}ebaux and Joachim Hertzberg
and William Shoaff and Moti Schneider},
TITLE = {A Stochastic Model of Actions and Plans
for Anytime Planning Under Uncertainty},
INSTITUTION = {ICSI},
YEAR = {1993},
NUMBER = {TR--93--027},
MONTH = {May},
)
The author field has an error; the }
after \{'
closes the author field and it all goes south from there.
I am honestly a little surprised (and not in a good way) that JabRef would export this without warning. Doesn't JabRef do basic checking on field contents?
Same goes for
@Book{BCMNS2003,
title = "The Description Logic Handbook --- Theory,
Implementation and Applications",
URL = "http://titles.cambridge.org/catalogue.asp?ISBN=0521781760",
added-by = "msteiner",
added-at = "Thu Feb 5 17:23:38 2004",
editor = "Franz Baader and Diego Calvanese and Deborah
McGuinness and Daniele Nardi and Peter
Patel-Schneider",
offline = "ISBN: 0521781760",
abstract = "Description Logics are a family of knowledge
representation languages that have been studied
extensively in Artificial Intelligence over the last
two decades. They are embodied in several
knowledge-based systems and are used to develop various
real-life applications. The Description Logic Handbook
provides a thorough account of the subject, covering
all aspects of research in this field, namely: theory,
implementation, and applications. Its appeal will be
broad, ranging from more theoretically-oriented
readers, to those with more practically-oriented
interests who need a sound and modern understanding of
knowledge representation systems based on Description
Logics. The chapters are written by some of the most
prominent researchers in the field, introducing the
basic technical material before taking the reader to
the current state of the subject, and including
comprehensive guides to the literature. In sum, the
book will serve as a unique reference for the subject,
and can also be used for self-study or in conjunction
with Knowledge Representation and Artificial
Intelligence courses.",
publisher = "Cambridge University Press",
year = "2003",
annote = "Contents: 1. An introduction to description logics D.
Nardi and R. J. Brachman; Part I. Theory: 2. Basic
description logics F. Baader and W. Nutt; 3. Complexity
of reasoning F. M. Donini; 4. Relationships with other
formalisms U. Sattler, D. Calvanese and R. Molitor; 5.
Expressive description logics D. Calvanese and G. De
Giacomo; 6. Extensions to description logics F. Baader,
R. K{\"u}sters and F. Wolter; Part II. Implementation:
7. From description logic provers to knowledge
representation systems D. L. McGuinness and P. F.
Patel-Schneider; 8. Description logics systems R.
M{\"o}ller and V. Haarslev; 9. Implementation and
optimisation techniques I. Horrocks; Part III.
Applications: 10. Conceptual modeling with description
logics A. Borgida and R. J. Brachman; 11. Software
engineering C. Welty; 12. Configuration D. L.
McGuinness; 13. Medical informatics A. Rector; 14.
Digital libraries and web-based information systems I.
Horrocks, D. L. McGuinness and C. Welty; 15. Natural
language processing E. Franconi; 16. Description logics
for data bases A. Borgida, M. Lenzerini and R. Rosati;
Appendix. Description logic terminology F. Baader;
Bibliography. See also
\cite{ href="http://www.inf.unibz.it/%7efranconi/dl/course/">http://www.inf.unibz.it/~franconi/dl/course/}",
}
which where the quote character in href="http://www
closes the field and the parser gets confused from there. But these two really are just malformed bibtex, and that makes the file as such malformed.
All the rest I can parse now.
Thanks! I'll check all those accents to make sure they are correct. I commented out that \cite{} oddity (came from CSBibs entry) earlier. I'll confirm when it's all working.
Not yet -- I have fixed the parser, which is going through its tests now. When that passes, I'll build a new BBT that has the parser and it will be posted here.
This almost works for me. Now I'm finding repeated crashes where Zotero (?) or BBT (?) doesn't like ill-formed URLs in the url
field.
Again -- hold off, a new version will be out tomorrow which deals with these issues. The first order of business was to get the input to parse.
Crashes, though? As in Zotero goes down in flames?
sorry -- not "goes down in flames," but "fails to import anything at all" instead of just throwing away or annotating the bad URLs.
:robot: this is your friendly neighborhood build bot announcing test build 5325 ("test cases for #873").
Right, give 5325 go.
Patashnik’s “BibTeXing” (http://mirrors.ctan.org/biblio/bibtex/base/btxdoc.pdf) says:
For Scribe compatibility, the database files allow an
@COMMENT
command; it’s not really needed because BibTEX allows in the database files any comment that’s not within an entry. If you want to comment out an entry, simply remove the ‘@
’ character preceding the entry type.
(No idea what Scribe is/was …)
In http://artis.imag.fr/~Xavier.Decoret/resources/xdkbibtex/bibtex_summary.html, there’s an interesting section on “Comments” which claims that
@comment{
@misc{...}
@letter{...}
}
does not work (haven’t tested this myself). As to a braceless @comment
, my feeling is that without any accompanying begin/end tags it can hardly apply to more than the line it appears in – but, honestly, I don’t really know.
Ultimately, the only valid test is studying the source code and/or the behaviour of bibtex and biber (the programs).
As https://github.com/aclements/biblib puts it:
There are a lot of BibTeX parsers out there. Most of them are complete nonsense based on some imaginary grammar made up by the module's author that is almost, but not quite, entirely unlike BibTeX's actual grammar. BibTeX has a grammar. It's even pretty simple, though it's probably not what you think it is. The hardest part of BibTeX's grammar is that it's only written down in one place: the BibTeX source code.
So I guess it would be best if someone ran a few tests through bibtex and biber.
:robot: this is your friendly neighborhood build bot announcing test build 5326 ("large tests on nightly").
Scribe is most likely https://en.wikipedia.org/wiki/Scribe_(markup_language)
If
@comment{
@misc{...}
@letter{...}
}
doesn't work then I have no idea what TTB means with "commenting out large blocks of text".
In any case, BBTs job is to be lenient, so currently it parses most of the original ai.bib, save for the two really broken references (which I'm pretty miffed JabRef didn't flag).
I didn't want to get into it, but I have definitely had problems trying to use @comment{ ... }
to comment out blocks. Sufficient problems that I gave up using it. Unfortunately, I didn't write down what failed me...
This still fails for me. I get an error on the attached version of ai.bib, and nothing is imported. Looking in the debug log, it looks like something might be trying to parse URLs and errors out on a bad one:
(1)(+0001298): { "type": "unknown_uri" "entry": "bai-fri-mci-icaps07" "field_name": "url" "value": "bai-fri-mci-icaps07.pdf" "line": 9736 }
(2)(+0000000): Translate: Translation using Better BibTeX failed: type => unknown_uri entry => bai-fri-mci-icaps07 field_name => url value => bai-fri-mci-icaps07.pdf line => 9736 string => [object Object] url => /Users/rpg/refs/ai.bib downloadAssociatedFiles => true automaticSnapshots => true
(5)(+0000000): Translate: Running handler 0 for error
(1)(+0000002): { "type": "unknown_uri" "entry": "bai-fri-mci-icaps07" "field_name": "url" "value": "bai-fri-mci-icaps07.pdf" "line": 9736 }
(3)(+0000000): Alert: An error occurred while trying to import the selected file. Please ensure that the file is valid and try again.
When I remove that ill-formed URL, everything seems to be well. AFAICT all the entries seem to be successfully parsed.
It seems actually reasonable that Zotero might be more stringent about URLs than Bibtex, which really just has to move them from input to output. I suspect this one crept in because I was (mis)using the URL field as if it was a "file" field. Thanks for all of your help!
I'm not sure why this issue was auto-reopened.
blip-bloop reopens any issue that wasn't closed by me; I like to keep issues open so I have a reminder for wrap-up work.
It won't be zotero that's complaining about the url, that would be bbt, which it shouldn't, and even then it should important all other references. The weird thing is that I've added the original ai.bib with the two fixes in the test set and that imports them all (985 I think). So there must be some difference between your situation and mine that I'm not handling properly. I'll take a look at the later ai.bib you posted.
You should not get these uri errors with the new parser, I've disabled them. What version did you import this with?
I got those results with 5.0.73
You have to try with 5325 or 5326 posted in this thread. 5.0.73 doesn't have these latest fixes yet.
Tested this with the 5326 build (I didn't fully understand how the build bot worked), and it seems fine, thanks.
:robot: this is your friendly neighborhood build bot announcing test build 5351 ("cleanup").
5.0.74 has the changes.
This thread has been automatically locked because it has not had recent activity. Please open a new issue for related bugs and link to relevant comments in this thread.
Bug classification
Problem with import. Tried to import large bib file. At least one entry, here:
failed to appear in Zotero after import (searching for "Essentials" finds nothing), although the import seems to complete. I suppose that the use of
@string
defined abbreviations (GINSBERG
,KAUFMANN
,KAUFMANN-ADDRESS
) might be at fault, but if that's the problem, shouldn't the import raise an error?Non-export problems with BBT
If your issue is a bug report, but not for exports, restart Zotero with debugging enabled (Help -> Debug Output Logging -> Restart with logging enabled), reproduce your problem, and select "Report Better BibTeX error" from the help menu, and post the resulting report ID (shown in red after you submit) here.
Report ID: H68P37J8