guillaumesalagnac / latex-compile

Yet another script to compile your latex documents
MIT License
11 stars 3 forks source link

Encoding issue #2

Closed dcoeurjo closed 4 years ago

dcoeurjo commented 4 years ago

Whem trying to use latex-compile on my mac, I have an encoding issue in the bibtex file that I don't understand:

latex-compile main
compiling main.tex...
Traceback (most recent call last):
  File "/Users/davidcoeurjolly/local/bin/latex-compile", line 464, in <module>
    latex_full_build()
  File "/Users/davidcoeurjolly/local/bin/latex-compile", line 388, in latex_full_build
    if bibtex_needed(before[jobname+".aux"],after[jobname+".aux"]):
  File "/Users/davidcoeurjolly/local/bin/latex-compile", line 158, in bibtex_needed
    old_aux = old_aux.decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2876: ordinal not in range(128)
dcoeurjo commented 4 years ago

any idea?

guillaumesalagnac commented 4 years ago

I was not able to reproduce the error, but I suspect some sort of mismatch between the encodings used in the .tex file and in the .bib file.

could you post a minimal example for me to play with ?

dcoeurjo commented 4 years ago

yeah my issue report was a bit short but I don't see any encoding issue in my bibtex file, nor aux file.. Just in case, the bibtex: https://gist.github.com/dcoeurjo/b057161d2007a6fab0f236b8ce017825

dcoeurjo commented 4 years ago

If use pdflatex and bibtex manually, there's n problem

dcoeurjo commented 4 years ago

(thx for the nice tool BTW;))

guillaumesalagnac commented 4 years ago

your bibtex file is pure ASCII, so I'd say the issue is probably in the latex. But without a reproducible example I'm shooting in the dark :-) Does the crash still happen if you replace lines 159 and 160 with the following ?

    old_aux = old_aux.decode('ascii',errors='ignore')
    new_aux = new_aux.decode('ascii',errors='ignore')

since I switched to python3 a few months back, encoding problems have been popping up everywhere and I have no proper solution yet. still, glad to hear the script is useful.

dcoeurjo commented 4 years ago

all good with the edit.. there might be some silly utf8 chars in my bibtex (cut/paste entries from googlescholar). Thanks a lot.

guillaumesalagnac commented 4 years ago

cheers.

kmccurley commented 1 year ago

Just FYI, this can be caused by the fact that the LaTeX log may be encoded into more than one character set. Input files to LaTeX are now assumed to be encoded as UTF-8, but some macros will result in T1 encoding when they are written to the log file. An example is Ð and \DJ\ and ü and \"u which gets converted by pdflatex to a line with both UTF-8 and T1 encoding. The best solution I have found in python is to try reading the latex log as UTF-8, and if this fails then read it with errors='replace' to handle non-UTF8 encoded characters.