Closed malb closed 7 years ago
Note that this changes the behaviour of parsebib-read-string
and parsebib-read-entry
somewhat:
parsebib-read-string
it is now more liberal in what it accepts and Both behaviours might be undesirable (?)
This looks great, thanks!
parsebib-read-string
it is now more liberal in what it accepts and
How, exactly? I'm looking at the code but I can't figure it out. :smile: I noticed you changed some of the regexes, but that would seem to make parsebib-read-string
less liberal.
I don't see that as a problem.
Consider
@string{acisp15month = jun # "~/~" # jul}
currently, parsebib would return ("acisp15month" . "\"~/~\"")
with this PR it will return ("acisp15month" . "jun~/~jul")
(possibly with replacements applied if they are provided as a dict) I believe this to be correct, but this is what I meant by "more liberal"
I'll look at that conflict, too
I'll look at that conflict, too
No need, it's just a name clash (I renamed a variable). It's already fixed in my local repo, but I have a few more changes that I'd like to test first before pushing them to github.
Consider
@string{acisp15month = jun # "~/~" # jul}
currently, parsebib would return
("acisp15month" . "\"~/~\"")
with this PR it will return("acisp15month" . > "jun~/~jul")
(possibly with replacements applied if they are provided as a dict) I believe this to be correct, > but this is what I meant by "more liberal"
Thanks. I think that would actually qualify as "more correct" rather than just "more liberal". I never realised that string definitions could contain other abbrevs. (Yeah, I know, your example bib file has it, so I could have known. :wink: )
Thanks for the code. It's already in my local repo's master and I'll push it to Github in the next few days. Then @tmalsburg can update helm-bibtex.
Thanks!
Sorry, I should have realised right away that the double quotes are important. That is, in your example file above, the line:
@string{cryptopub = springer}
should be parsed as ("cryptopub" . "springer")
, but the line:
@string{cryptoname = "CRYPTO"}
should be parsed as ("cryptoname" . "\"CRYPTO\"")
.
Furthermore, your code doesn't seem to handle braces {}
correctly. Note that @strings
can also be defined as:
@string{cryptoname = {CRYPTO}}
which is equivalent to using "CRYPTO"
. Parsing this with your code yields ("cryptoname" . "CRYPTO}")
. What it should yield is ("cryptoname" . "{CRYPTO}")
.
What's also not handled correctly is a field value like this:
publisher = ucsb-ece # {, Germany},
When expanded, this should yield the string "Springer, Heidelberg, Germany"
, but instead it yields "ucsb-ece{, Germany}"
.
The reason parsebib needs to be this precise is that Ebib is not just a front-end for displaying the contents of bib files (as helm-bibtex is), but also for editing bib files. So it needs to know whether a field value is simply a string value or whether it contains abbreviations.
Another thing: it occurred to me that it's not safe to split a string/field value on #
. On the one hand, the white space is optional, so BibTeX (well, Biber, since that is what I tested with, but I suspect BibTeX behaves the same way) accepts the following just fine:
note = {Note: }#SLL
SLL
is expanded just fine here. Furthermore, if we'd change the code to split on #
without the spaces, we run the risk of wrongly splitting field values that contain a #
as part of normal text. (And note that in something like \verb+#+
the # sign can appear in e.g., a title without having to be written as \#
).
I'll take a closer look at the code in the next few days and see what I can do to improve it.
Thanks, could you push your merged branch somewhere so I could work off that in my attempts to fix these?
I think the right thing™ to do is to separate out the two use cases. In ebib you do not want any automatic replacements to be applied, since it's also an editor. In helm-bibtex I'd argue those are desired. Thus, the default should be to return the raw string with no replacement/modification whatsoever. When a hash table is provided, though, a fixed version of my replacements should be applied. Does that sound right?
I've just pushed some chances. Running
(progn
(message "W/O %s" (parsebib-read-string 203))
(message "W/O %s" (parsebib-read-string 246))
(message "W/O %s" (parsebib-read-string 288))
(message "W/O %s" (parsebib-read-string 330))
(message "W/O %s" (parsebib-read-string 379))
(message "W/O %s" (parsebib-read-string 424))
(setq test-hash (make-hash-table :test #'equal))
(puthash "ucsb-ece" "Springer, Heidelberg" test-hash)
(message "W %s" (parsebib-read-string 203 test-hash))
(message "W %s" (parsebib-read-string 246 test-hash))
(message "W %s" (parsebib-read-string 288 test-hash))
(message "W %s" (parsebib-read-string 337 test-hash))
(message "W %s" (parsebib-read-string 379 test-hash))
(message "W %s" (parsebib-read-string 424 test-hash)))
on
@string{ucsb-ece = "Springer, Heidelberg"}
@string{cryptopub = ucsb-ece}
@string{cryptoname = "CRYPTO" # "81"}
@string{cryptoname = {CRYPTO}}
@string{publisher = ucsb-ece#{, Germany}}
@string{note = {Note: }#SLL}
outputs
W/O (ucsb-ece . "Springer, Heidelberg")
W/O (cryptopub . ucsb-ece)
W/O (cryptoname . "CRYPTO" # "81")
W/O (cryptoname . {CRYPTO})
W/O (publisher . ucsb-ece#{, Germany})
W/O (note . {Note: }#SLL)
W (ucsb-ece . "Springer, Heidelberg")
W (cryptopub . "ucsb-ece")
W (cryptoname . "CRYPTO81")
W (cryptoname . "CRYPTO")
W (publisher . "ucsb-ece, Germany")
W (note . "Note: SLL")
This seems correct (?) In related news, as you can tell from my test code, my elisp sucks. Also, perhaps overloading the strings
parameter with meaning like this will lead to confusion down the line?
the default should be to return the raw string with no replacement/modification whatsoever. When a hash table is provided, though, a fixed version of my replacements should be applied. Does that sound right?
Yup, I was thinking the same thing.
I pushed my changes to a new branch malb-master
(now ineptly named, because there is much more than just your changes, but well). I have a few hours to kill, so I'll see what I can come up with.
Note, I meant these:
@string{publisher = ucsb-ece#{, Germany}}
@string{note = {Note: }#SLL}
not as @string
s but as field values in some entry. (Though it won't make that much of a difference when it's just for testing).
Also, perhaps overloading the strings parameter with meaning like this will lead to confusion down the line?
You mean using it to pass the string replacements and to signal the values can be expanded? I don't think so, because it doesn't make sense to pass a strings hash without wanting to do the expansions.
TBH, I don't think simply removing all braces is the way to go. It's possible to have e.g. \{ \}
in a field value, and those should be retained, even in the expansion, I think. Your current code would remove them, but leave the backslashes. I admit that such cases will be rare, but I'd rather get it right.
Ebib has code to detect whether a given field value contains abbrevs, which I think is fairly reliable (it's in ebib-db.el
, specifically ebib-db-unbraced-p
), but doesn't split the different parts so that the abbrevs can be expanded.
I just pushed an update to malb-master that hopefully does the right thing. Seems to work fine when testing on your test file. Could you check it out as well and see if the results you get are what you expect? If yes, I can move the code to master and helm-bibtex can start making use of it.
This is great. I've pushed one small bug fix to my clone which falls back to returning the string as is if all else fails. Otherwise, I was having trouble reading some fields such as authors
in:
@InProceedings{ACISP:LMLLGY16,
author = "Min Cherng Lee and
Robin Mitra and
Emmanuel Lazaridis and
An Chow Lai and
Yong Kheng Goh and
Wun-She Yap",
title = "Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models",
pages = "77--93",
doi = "10.1007/978-3-319-40253-6_5",
crossref = acisp16-1,
}
Note the line breaks in the authors field.
I've pushed one small bug fix to my clone which falls back to returning the string as is if all else fails.
Hmm. I missed that.
Note the line breaks in the authors field.
Yes, thanks for pointing that out. I changed it so that when string abbrevs are expanded, newlines in field values are removed and multiple spaces are reduced to one. That seems to make sense, because the field values are meant for display.
I merged the branch into master, so in a few hours it should be available for helm-bibtex to make use of.
If an optional
strings
parameter is supplied, this is now used to perform string replacements, cf #5.Tested with this file:
and this function