perrette / papers

Command-line tool to manage bibliography (pdfs + bibtex)
MIT License
142 stars 22 forks source link

Parsing braces when generating citation key #21

Closed wwywong closed 3 years ago

wwywong commented 3 years ago

If the BibTeX returned by the DOI query includes braces in the author list, then it seems the code which generates the citation key based on the last name of the first author fails.

Example: The article http://dx.doi.org/10.4169/amer.math.monthly.118.05.450 returns via papers extract the following "entry"

@article{Alan D. Sokal_2011,
 author = {{Alan D. Sokal}, },
 doi = {10.4169/amer.math.monthly.118.05.450},
 journal = {The American Mathematical Monthly},
 number = {5},
 pages = {450},
 publisher = {Informa UK Limited},
 title = {A Really Simple Elementary Proof of the Uniform Boundedness Theorem},
 url = {http://dx.doi.org/10.4169/amer.math.monthly.118.05.450},
 volume = {118},
 year = {2011}
}

Note that the citation key contains spaces and is invalid BibTeX.

perrette commented 3 years ago

Hi @wwywong, thanks for reporting the issue. Interestingly, I get something different:

@article{Alan_D_Sokal_2011,
    doi = {10.4169/amer.math.monthly.118.05.450},
    url = {https://doi.org/10.4169%2Famer.math.monthly.118.05.450},
    year = 2011,
    publisher = {Informa {UK} Limited},
    volume = {118},
    number = {5},
    pages = {450},
    author = {Alan D. Sokal},
    title = {A Really Simple Elementary Proof of the Uniform Boundedness Theorem},
    journal = {The American Mathematical Monthly}
}

so my guess is that it has to do with the dependencies. That's my dependency list: pip freeze | grep -e unidecode -e crossrefapi -e bibtexparser -e scholarly -e rapidfuzz -e six

bibtexparser==1.2.0
crossrefapi==1.5.0
rapidfuzz==1.2.1
scholarly==1.1.0
six==1.14.0
text-unidecode==1.3

Can you tell me about yours? Most likely bibtextparser.

perrette commented 3 years ago

There is also poppler: pdftotext -v

pdftotext version 0.86.1
perrette commented 3 years ago

And to make sure we are on the same page, please install the latest version that is now also on pypi: pip install -U papers-cli

wwywong commented 3 years ago

To be honest, I had uninstalled papers after I failed to get it to work. So at this point I cannot give you more info. (Sorry!)

However, I had bibtexparser on my system already at the time and I don't remember a different version being installed; which means that the version of bibtexparser I had should be 1.2.0.

My version of pdftotext is

pdftotext version 21.06.1
Copyright 2005-2021 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC

but I don't think that is the issue since it was able to retrieve the bibliographic info. If anything I would guess the issue is with crossrefapi.

Before you close this as unreproducible: the URL you showed in your output a few comments above is malformed (for use with bibtex)

url = {https://doi.org/10.4169%2Famer.math.monthly.118.05.450},
                              ^^^

Somehow the / got escaped unnecessarily as %2F; while the URL is valid, trying to process that with bibtex will likely give you an error.

perrette commented 3 years ago

I see it gets escaped, but in my own use cases (pandoc) it is not an issue. If latex complains, it seems that workarounds exist.

By the way, I took another look into your original issue. I can see now the bibtex key I was posting has little to do with papers, and is a direct result of the underlying crossrefapi query: http://api.crossref.org/works/10.4169/amer.math.monthly.118.05.450/transform/application/x-bibtex.

In the code, when you use papers add 10.4169@amer.math.monthly.118.05.450.pdf --bib biblio.bib that key is not even used. Instead a new key is generated with the author name and become sokal2011. The machinery to do that is tightly related to bibtexparser.

Anyway, papers is experimental. Since the early development (3 weeks of work...) I did not have the opportunity to use it extensively. Until that happens, there will always be a number of issues in corner cases. If you ever come back to it, feel free to get in touch to help diagnose these.