Closed ariroffe closed 2 years ago
As per your PR, you chose the first option of returning an error, which might be a sensible pick.
However, that very much interferes with how you try to use biblatex
in your project. I'm not a fan of the second option of arbitrarily messing with the user input. The alternative I'd like to propose is this:
parse
and from_raw
do not return errors for duplicate entries. All well-formed entries in the file will be pushed into the Bibliography::entries
vector, but the record in the Bibliography::keys
map will always point to the first occurrence. This way, methods like Bibliography::get
will always return the first entry but the parsing process will not fail with duplicates. Library consumers can iterate over Bibliography::entries
(including duplicates) by calling Bibliography::iter
to provide their own duplicate handling.
This way, biblatex
makes no assumptions about the consumer's requirements for dupe handling and the like.
To facilitate this usage pattern, I can add into_iter
and into_vec
methods on Bibliography
.
Would these changes solve your issue?
I'm actually happy with the changes. I got a bit carried away when I opened the issue, but I don't think biblatex
should evolve thinking about how I, in particular, intend to use it.
Cite keys should be unique (according to the bibtex specification). A bib file with a duplicate key is malformed. If we are returning an error and failing to parse when a file is malformed due to an incorrect field, then we should treat other flaws in the same way, keeping the library uniform. Hence, it should error out on duplicate keys. Would you agree? (If so, feel free to close this issue)
P.S. please remember to bump up the version and update in crates.io
so that I can use the library with these changes :)
I was running some tests and found the following.
I manually counted that
tests/gral.bib
has 85 entries. But after runningparse
the resultingBibliography
has 83 entries. I tracked down 2 entries that have repeated keys in the file, these are:and:
The
parse
method just keeps the second one. I guess this is because of how theinsert
method works in aHashMap
/BTreeMap
.Is this intended behavior? I don't think
parse
should assume that the latter one is the correct entry, perhaps the user wishes to keep the first one if asked, or to change one key and keep both. I see two possible solutions:1-
parse
errors out in this case2- keep both entries, but automatically put a (1) after the key of the second one (or (2) if there are already 2 of the same, etc.). This solution would also play nicely with my library, since handling this type of repetitions is precisely what it is designed to do. If we go this route, I have a similar function for doing that (see fn get_new_citation_key in line 173). I could adapt it and make a pull request here.