Open jrjhealey opened 6 years ago
Hi Joe! Thanks for the feedback.
I'd say we should regex against some common HTML code in titles (italics, subscript, and superscript, mainly). Do you have any examples at hand?
For the duplicate entries, let's create a separate issue.
Yep ok good idea! I'll open another issue for duplicates.
I'll commit a folder of different examples that I come up with to my fork of the repo, and then make a PR so you can test against them too perhaps?
Currently what I've thought of are an example of:
In my experience it's quite good at converting special characters in names etc so that's probably enough to cover 90% of the troublesome refs.
Edit:
It looks like subs/superscript might be difficult, as Mendeley (which I export my bib files from), just coerces them to normal case letters/numbers (they have no HTML around them).
Hi Jaime,
Possible enhancement for you!
If
pybtex
doesn't already correct this, it would be good if this can also incorporate the fix for correctly italicising Latin bionomials (fairly simple search-and-replace to switch HTML italics tags, to TeX format tags. There's an old script online (below) which does essentially this, but isn't the best Python in the world... Inspired by:https://twitter.com/MendeleySupport/status/776001527664156672
and
https://itskathylam.wordpress.com/2016/01/12/dealing-with-italics-in-bibtex-files-exported-from-mendeley/
If there was some logic to catch and handle duplicate entries that would be really useful too (a problem I end up with quite often).
Cheers!
Joe