Closed WolfgangFahl closed 1 year ago
Hi @WolfgangFahl
While you could use bibtexparser-customizers to achieve much of what you're trying to do. But I guess you're much better off by just using a latex parser on the strings returned by bibtexparser (e.g. https://github.com/phfaist/pylatexenc/).
Bibtexparser v2 will actually leverage such a parser internally. However, there's still a long way to go before that's going to be released ;-)
I'm closing this issue as it does not require a code change, but feel free to add follow-up remarks...
Indeed you hint is correct. See also my question https://stackoverflow.com/questions/75426142/pure-text-non-latex-results-for-python-bibtex-parser
from pylatexenc.latex2text import LatexNodes2Text
ln2t=LatexNodes2Text()
for key in btex:
latex=btex[key]
no_latex=ln2t.latex_to_text(latex)
btex[key]=no_latex
will convert the latex dict entries back to text. If there are others who need it it might be added as a convenience function.
Example bibtex
@inproceedings{Dijkstra_1967,
doi = {10.1145/800001.811672},
url = {https://doi.org/10.1145%2F800001.811672},
year = 1967,
publisher = {{ACM} Press},
author = {Edsger W. Dijkstra},
title = {The structure of the {\textquotedblleft}the{\textquotedblright}-multiprogramming system},
booktitle = {Proceedings of the {ACM} symposium on Operating System Principles - {SOSP} {\textquotesingle}67}
}
dict with latex
{
"booktitle": "Proceedings of the {ACM} symposium on Operating System Principles - {SOSP} {\\textquotesingle}67",
"title": "The structure of the {\\textquotedblleft}the{\\textquotedblright}-multiprogramming system",
"author": "Edsger W. Dijkstra",
"publisher": "{ACM} Press",
"year": "1967",
"url": "https://doi.org/10.1145%2F800001.811672",
"doi": "10.1145/800001.811672",
"ENTRYTYPE": "inproceedings",
"ID": "Dijkstra_1967"
}
dict with plaintext (utf-8)
{
"booktitle": "Proceedings of the ACM symposium on Operating System Principles - SOSP '67",
"title": "The structure of the \u201cthe\u201d-multiprogramming system",
"author": "Edsger W. Dijkstra",
"publisher": "ACM Press",
"year": "1967",
"url": "https://doi.org/10.1145",
"doi": "10.1145/800001.811672",
"ENTRYTYPE": "inproceedings",
"ID": "Dijkstra_1967"
}
currently i am doing:
using the DOI helper class below. I was hoping to simplify my life since the citeproc result looks quite complicated and i'd love to have some cleanup in e.g. authors and titles.
The bibtexparser does a great job but i don'want a latex result but just clear text.
E.g for 10.1145/800001.811672 i get
While the plain text
would be better for my use case. Is this already possible with the current bibtexparser or a feature request?
doi.py