Open stanleyrhodes opened 2 years ago
Please open a PR with your changes nevertheless, and we can worry about the tests later on. We don't want diverging forks, since eventually you might not be able to easily upgrade the package as we make new releases.
Describe the bug The bibtex format supports the field
year
. However, withinpublication['bib']
,pub_year
is used as the field, and further, thescholarly.bibtex()
function outputs a.bib
file withpub_year
that then won't import properly (e.g., Zotero import). This behavior can be seen in the quickstart example: https://scholarly.readthedocs.io/en/latest/quickstart.html#bibtexThere is evidence that someone in the past meant to change
pub_year
to year in thebibtex()
function. It's in_BIB_MAPPING
but NOT in_BIB_REVERSE_MAPPING
for some reason.bibtex()
takes_BIB_REVERSE_MAPPING
as an argument.To Reproduce Run
bibtex()
on any publication where the GS entry has a pub date. The bug can be seen and replicated using the example in the quickstart: https://scholarly.readthedocs.io/en/latest/quickstart.html#bibtexExpected behavior The
bibtex()
function should output a.bib
file with the properyear
field.However, after some digging around, I would make a stronger claim: I see no reason
pub_year
should be used in thepublication
object at all, not inpublication[bib]
or anywhere else. Consider where it occurs: https://github.com/scholarly-python-package/scholarly/search?q=pub_yearget_publication()
>_scholar_pub()
> 3 blocks of code that createpublication['bib']['pub_year']
fill() also has one place where ['pub_year'] is created. Together that's 4 places in publication_parser.py TheBibEntry
object has it.This leaves me with the questions, all of which I currently think are answered with "no": Is there any reason not to just change it at the start, in
_scholar_pub()
andfill()
? Is there any reason why it's in_BIB_MAPPING
but NOT in_BIB_REVERSE_MAPPING
? Is there any reason not to change it in data_types.py for theBibEntry
object?It would also need to be changed in
test_module.py
.Do you plan on contributing? Not immediately, as I am not yet wise to the ways of tests and contributing established for this project. However, I will be revising this code in Scholarly fork in my repo, including the test_module.py. Also, since the test itself needs to be changed, I'm not sure if that necessitates any additional consideration or overhead.
Additional context There's a larger, more general discussion to be had about
bibtex()
and.bib
output, and citation manager outputs in general. I'm no expert on the bibtex format. But from what I can tell, Scholarly outputs a.bib
file that is quite non-standard*: utf-8 rather than latin-1 (iso-8859-1). In my case, this is very much a GOOD thing! Since so many fields (e.g., author) can have characters outside those 256, it's better that it does utf-8. Zotero, for example, allows the import of utf-8 .bib files it even though it's non-standard, to my great relief. My view is that Scholarly should be able to pass all the information possible from the GS entry to Zotero and other citation managers. My needs, for example, include the publication's url as well, and that's not bibtex-standard. However, we could easily have a strict .bib setting and a default max-info-lax-standards setting. More complicated formats (e.g., Zotero RDF) could wait until someone is very motivated.(*) We also have
venue
in the .bib output which doesn't seem to add much value at all. Around half the time they have a truncated journal title and the ellipses character in them.journal
is the useful field, notvenue
.