Closed csware closed 8 months ago
Thanks!
Maybe something like this (Works For Me™)?
import bibtexparser
from bibtexparser.library import Library
from bibtexparser.model import Block, Entry
class NormalizeFieldNames(bibtexparser.middlewares.middleware.BlockMiddleware):
def __init__(self,
allow_inplace_modification: bool = True):
super().__init__(allow_inplace_modification=allow_inplace_modification,
allow_parallel_execution=True)
def transform_entry(self, entry: Entry, library: "Library") -> Union[Block, Collection[Block], None]:
for field in entry.fields:
field.key = field.key.lower()
return entry
Usage example:
library = bibtexparser.parse_file(filename,
append_middleware=[NormalizeFieldNames(),
bibtexparser.middlewares.SeparateCoAuthors(),
bibtexparser.middlewares.SplitNameParts()])
That's probably alright. Would you be willing to convert it to a PR (adding a test)? I think this is a quite common use-case that we should support.
Fully agree with @tdegeus, and would appreciate a PR by @Technologicat
Just one remark: We'd have to be able to handle "new" duplicates somehow (i.e., if two field keys exist in the original block which only differ in their capitalization). That's particularly important now that we're pushing the use of entries as dicts. In principle, we have an entry type DuplicateFieldKeyBlock
that should be used here, but I am also happy to support additional suggestions. These would probably have to be enabled with a corresponding constructor parameter (e.g. raising an exception). Does this make sense?
@tdegeus: Sure.
@MiWeiss: Good point about conflicting keys. But I'll need a bit more information about the desired way to tackle it.
The way this approximately went is, yesterday I got a sudden need to extract some data from BibTeX in Python.
Within an hour, I had installed bibtexparser
, upgraded it to 2.x, ran into this issue (since my datafiles happened to use capitalized keys), written the simplest possible field key normalizer, and posted a copy here. So it's fair to say I'm kind of new to this project :)
A solution would be to issue a warning (similar to library.failed_blocks) and use the last key value.
@csware: Thanks. Yes, that's one possible solution, and probably the simplest one that works.
~Considering alternatives, what about the DuplicateFieldKeyBlock
mentioned by @MiWeiss?~ EDIT: Nevermind, I think I understood what you all meant now.
Implemented, using @csware's suggestion of emitting a warning and letting the last value win. Please review.
Describe the bug I have several .bib files that contain (mixed) field keys that are either in lowercase or start with a capital letter, such as "Author" and "Title". No other tooling complains about this.
SeparateCoAuthors does not work and I cannot uniformy access the fields using e.g.
entry['title']
A normalization to lowercase of the field keys was conducted in v1.
Maybe this can be fixed using a middleware? I would be really grateful!
Reproducing
Version: e3757c13abf2784bda612464843ab30256317e6c
Code:
Bibtex:
Remaining Questions (Optional) Please tick all that apply: