texworld / betterbib

:green_book: Command-line tools for bibliographies.
816 stars 42 forks source link

Dashes #245

Closed RMeli closed 2 years ago

RMeli commented 2 years ago

I encountered several references where instead of using - in the title the following Unicode characters were used:

The presence of such characters makes compilation fail.

Should the .bib file be sanitized by transforming these two characters into - or --? (I think in titles the former is more appropriate)

RMeli commented 2 years ago

In #137 is seems that there was something on those lines

value = value.replace("\u2010", "-")

but it's been removed?

RMeli commented 2 years ago

I switched to a different document where I have been using betterbib for a while, added one reference, and run betterbib. Several instances of the problem described above started to appear (in references different to the one that was added).

Is it possible that something added/removed in the latest version is causing this problem? Or am I missing something obvious here?

nschloe commented 2 years ago

As usual, and MWE is needed.

RMeli commented 2 years ago

Sorry about that @nschloe . This is an entry for which it happens:

@article{Ragoza2017,
    journal = {J. Chem. Inf. Model.},
    number = {4},
    doi = {10.1021/acs.jcim.6b00740},
    year = {2017},
}

Using

betterbib update --doi-url-type short -t test.bib

I get the following entry:

@article{Ragoza2017,
    author = {Ragoza, Matthew and Hochuli, Joshua and Idrobo, Elisa and Sunseri, Jocelyn and Koes, David Ryan},
    journal = {J. Chem. Inf. Model.},
    number = {4},
    doi = {10.1021/acs.jcim.6b00740},
    year = {2017},
    source = {Crossref},
    url = {https://doi.org/10/f9zwhj},
    volume = {57},
    publisher = {American Chemical Society (ACS)},
    title = {{Protein–Ligand} Scoring with Convolutional Neural Networks},
    issn = {1549-9596, 1549-960X},
    pages = {942--957},
    month = apr,
}

The is U+2013, which makes LaTeX compilation fail. I did not encountered this problem with previous versions (but I don't know at which version it started appearing, sorry).


betterbib 4.2.2 [Python 3.9.10]
Copyright (c) 2013-2021 Nico Schlömer
nschloe commented 2 years ago

Can you try the latest version?

RMeli commented 2 years ago

I get the same with the latest version from pip

betterbib 4.3.5 [Python 3.9.10]
Copyright (c) 2013-2022 Nico Schlömer
%comment{This file was created with betterbib v4.3.5.}

@article{Ragoza2017,
    author = {Ragoza, Matthew and Hochuli, Joshua and Idrobo, Elisa and Sunseri, Jocelyn and Koes, David Ryan},
    journal = {J. Chem. Inf. Model.},
    number = {4},
    doi = {10.1021/acs.jcim.6b00740},
    year = {2017},
    source = {Crossref},
    url = {https://doi.org/10/f9zwhj},
    volume = {57},
    publisher = {American Chemical Society (ACS)},
    title = {{Protein–Ligand} Scoring with Convolutional Neural Networks},
    issn = {1549-9596, 1549-960X},
    pages = {942--957},
    month = apr,
}
nschloe commented 2 years ago

Alright. This is about the dash in the title, right?

RMeli commented 2 years ago

Yes, that's what causing the problem now. I have sanitised this bibliography before using betterbib without issue. In #137 it seems that the following line of code has been removed:

value = value.replace("\u2010", "-")

In this example it is U+2013, but I encountered the same issue with U+2010 and U+2212 as well.

RMeli commented 2 years ago

I just saw #239, maybe the same can be applied to other fields as well?

nschloe commented 2 years ago

This should now be fixed (4.3.6).

RMeli commented 2 years ago

Thank you @nschloe !