plk / biber

Backend processor for BibLaTeX
Artistic License 2.0
332 stars 37 forks source link

« Wide character in die at -e line 624 » with some Unicode characters in outdir’s path #474

Open BenjaminGalliot opened 3 months ago

BenjaminGalliot commented 3 months ago

Hello,

It seems that the latest version of biber has problems with some Unicode characters in the path (outdir of latexmk).

Strangely, not all Unicode characters have this problem, and John Collins was unable to reproduce this behavior on his system.

I'm on Linux Manjaro, with the latest version of Texlive 2024 (updated yesterday). The 2023 version, and the 2024 version at the very beginning of the year did not have this problem, which appeared when I updated everything yesterday.

Rc files read:
  NONE
Latexmk: This is Latexmk, John Collins, 31 Jan. 2024. Version 4.83.
Latexmk: making output directory 'resultats'
Latexmk: Doing main (small) clean up for 'test.tex'
No existing .aux file, so I'll make a simple one, and require run of *latex.
Force everything to be remade.
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Category 'other':
  Rerun of 'lualatex' forced or previously required:
    Reason or flag: 'go_mode'

------------
Run number 1 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="resultats"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file 'resultats/test.log'
Latexmk: Examining 'resultats/test.fls'
Latexmk: Examining 'resultats/test.log'
Latexmk: Missing bbl file 'resultats/test.bbl' in following:
 No file test.bbl.
Latexmk: References changed.
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: applying rule 'biber resultats/test'...
Rule 'biber resultats/test':  Reasons for rerun
Category 'other':
  Rerun of 'biber resultats/test' forced or previously required:
    Reason or flag: 'Initial set up of rule'

------------
Run number 1 of rule 'biber resultats/test'
------------
------------
Running 'biber  "resultats/test.bcf"'
------------
INFO - This is Biber 2.20
INFO - Logfile is 'resultats/test.blg'
INFO - Reading 'resultats/test.bcf'
INFO - Found 1 citekeys in bib section 0
INFO - Processing section 0
INFO - Looking for bibtex file 'bibliographie.bib' for section 0
INFO - LaTeX decoding ...
INFO - Found BibTeX data source 'bibliographie.bib'
INFO - Overriding locale 'en-US' defaults 'normalization = NFD' with 'normalization = prenormalized'
INFO - Overriding locale 'en-US' defaults 'variable = shifted' with 'variable = non-ignorable'
INFO - Sorting list 'nty/global//global/global/global' of type 'entry' with template 'nty' and locale 'en-US'
INFO - No sort tailoring available for locale 'en-US'
INFO - Writing 'resultats/test.bbl' with encoding 'UTF-8'
INFO - Output to resultats/test.bbl
Latexmk: Found biber source file(s) [bibliographie.bib, resultats/test.bcf]
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Changed files or newly in use/created:
  resultats/test.aux
  resultats/test.bbl
  resultats/test.out

------------
Run number 2 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="resultats"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file 'resultats/test.log'
Latexmk: Examining 'resultats/test.fls'
Latexmk: Examining 'resultats/test.log'
Latexmk: Found input bbl file 'resultats/test.bbl'
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Changed files or newly in use/created:
  resultats/test.aux
  resultats/test.run.xml

------------
Run number 3 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="resultats"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file 'resultats/test.log'
Latexmk: Examining 'resultats/test.fls'
Latexmk: Examining 'resultats/test.log'
Latexmk: Found input bbl file 'resultats/test.bbl'
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Changed files or newly in use/created:
  resultats/test.run.xml

------------
Run number 4 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="resultats"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file 'resultats/test.log'
Latexmk: Examining 'resultats/test.fls'
Latexmk: Examining 'resultats/test.log'
Latexmk: Found input bbl file 'resultats/test.bbl'
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: All targets (resultats/test.pdf) are up-to-date

$ latexmk -lualatex -outdir=résultats -synctex=1 -interaction=batchmode 'test.tex' -gg
Rc files read:
  NONE
Latexmk: This is Latexmk, John Collins, 31 Jan. 2024. Version 4.83.
Latexmk: making output directory 'résultats'
Latexmk: Doing main (small) clean up for 'test.tex'
No existing .aux file, so I'll make a simple one, and require run of *latex.
Force everything to be remade.
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Category 'other':
  Rerun of 'lualatex' forced or previously required:
    Reason or flag: 'go_mode'

------------
Run number 1 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="résultats"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file 'résultats/test.log'
Latexmk: Examining 'résultats/test.fls'
Latexmk: Examining 'résultats/test.log'
Latexmk: Missing bbl file 'résultats/test.bbl' in following:
 No file test.bbl.
Latexmk: References changed.
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: applying rule 'biber résultats/test'...
Rule 'biber résultats/test':  Reasons for rerun
Category 'other':
  Rerun of 'biber résultats/test' forced or previously required:
    Reason or flag: 'Initial set up of rule'

------------
Run number 1 of rule 'biber résultats/test'
------------
------------
Running 'biber  "résultats/test.bcf"'
------------
Wide character in die at -e line 624.
Can't open résultats/test.blg (No such file or directory) at /tmp/par-62656e6a616d696e/cache-8e80c9c14f39e44498a1091586b807a0d52ef04a/inc/lib/Log/Log4perl/Appender/File.pm line 151.
Latexmk: Error return from 'biber résultats/test'
I will add to its source list, anything cached from analysis of bcf file.
Latexmk: Summary of warnings from last run of *latex:
  Latex failed to resolve 1 citation(s)
Latexmk: ====Undefined refs and citations with line #s in .tex file:
  Citation 'jacques21grammar' on page 1 undefined on input line 13
Latexmk: Errors, so I did not complete making targets
Collected error summary (may duplicate other messages):
  biber résultats/test: Could not open biber log file for 'résultats/test'

Latexmk: Sometimes, the -f option can be used to get latexmk
  to try to force complete processing.
  But normally, you will need to correct the file(s) that caused the
  error, and then rerun latexmk.
  In some cases, it is best to clean out generated files before rerunning
  latexmk after you've corrected the files.

$ latexmk -lualatex -outdir=啊啊啊 -synctex=1 -interaction=batchmode 'test.tex' -gg
Rc files read:
  NONE
Latexmk: This is Latexmk, John Collins, 31 Jan. 2024. Version 4.83.
Latexmk: making output directory '啊啊啊'
Latexmk: Doing main (small) clean up for 'test.tex'
No existing .aux file, so I'll make a simple one, and require run of *latex.
Force everything to be remade.
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Category 'other':
  Rerun of 'lualatex' forced or previously required:
    Reason or flag: 'go_mode'

------------
Run number 1 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="啊啊啊"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file '啊啊啊/test.log'
Latexmk: Examining '啊啊啊/test.fls'
Latexmk: Examining '啊啊啊/test.log'
Latexmk: Missing bbl file '啊啊啊/test.bbl' in following:
 No file test.bbl.
Latexmk: References changed.
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: applying rule 'biber 啊啊啊/test'...
Rule 'biber 啊啊啊/test':  Reasons for rerun
Category 'other':
  Rerun of 'biber 啊啊啊/test' forced or previously required:
    Reason or flag: 'Initial set up of rule'

------------
Run number 1 of rule 'biber 啊啊啊/test'
------------
------------
Running 'biber  "啊啊啊/test.bcf"'
------------
INFO - This is Biber 2.20
INFO - Logfile is '啊啊啊/test.blg'
Wide character in print at /tmp/par-62656e6a616d696e/cache-8e80c9c14f39e44498a1091586b807a0d52ef04a/inc/lib/Log/Log4perl/Appender/Screen.pm line 57.
INFO - Reading './啊啊啊/test.bcf'
INFO - Found 1 citekeys in bib section 0
INFO - Processing section 0
INFO - Looking for bibtex file 'bibliographie.bib' for section 0
INFO - LaTeX decoding ...
INFO - Found BibTeX data source 'bibliographie.bib'
INFO - Overriding locale 'en-US' defaults 'normalization = NFD' with 'normalization = prenormalized'
INFO - Overriding locale 'en-US' defaults 'variable = shifted' with 'variable = non-ignorable'
INFO - Sorting list 'nty/global//global/global/global' of type 'entry' with template 'nty' and locale 'en-US'
INFO - No sort tailoring available for locale 'en-US'
INFO - Writing '啊啊啊/test.bbl' with encoding 'UTF-8'
INFO - Output to 啊啊啊/test.bbl
Latexmk: Found biber source file(s) [./啊啊啊/test.bcf, bibliographie.bib]
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Changed files or newly in use/created:
  啊啊啊/test.aux
  啊啊啊/test.bbl
  啊啊啊/test.out

------------
Run number 2 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="啊啊啊"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file '啊啊啊/test.log'
Latexmk: Examining '啊啊啊/test.fls'
Latexmk: Examining '啊啊啊/test.log'
Latexmk: Found input bbl file '啊啊啊/test.bbl'
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Changed files or newly in use/created:
  啊啊啊/test.aux
  啊啊啊/test.run.xml

------------
Run number 3 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="啊啊啊"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file '啊啊啊/test.log'
Latexmk: Examining '啊啊啊/test.fls'
Latexmk: Examining '啊啊啊/test.log'
Latexmk: Found input bbl file '啊啊啊/test.bbl'
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: applying rule 'lualatex'...
Rule 'lualatex':  Reasons for rerun
Changed files or newly in use/created:
  啊啊啊/test.run.xml

------------
Run number 4 of rule 'lualatex'
------------
------------
Running 'lualatex  -synctex=1 -interaction=batchmode -recorder -output-directory="啊啊啊"  "test.tex"'
------------
This is LuaHBTeX, Version 1.18.0 (TeX Live 2024) 
 restricted system commands enabled.
SyncTeX written on test.synctex.gz.
Latexmk: Getting log file '啊啊啊/test.log'
Latexmk: Examining '啊啊啊/test.fls'
Latexmk: Examining '啊啊啊/test.log'
Latexmk: Found input bbl file '啊啊啊/test.bbl'
Latexmk: Log file says output to 'test.pdf'
Latexmk: Bibliography file(s) from .bcf file:
  bibliographie.bib
Latexmk: All targets (啊啊啊/test.pdf) are up-to-date
$ biber résultats/test.bcf
Wide character in die at -e line 624.
Can't open résultats/test.blg (No such file or directory) at /tmp/par-62656e6a616d696e/cache-8e80c9c14f39e44498a1091586b807a0d52ef04a/inc/lib/Log/Log4perl/Appender/File.pm line 151.

$ biber 啊啊啊/test.bcf
INFO - This is Biber 2.20
INFO - Logfile is '啊啊啊/test.blg'
Wide character in print at /tmp/par-62656e6a616d696e/cache-8e80c9c14f39e44498a1091586b807a0d52ef04a/inc/lib/Log/Log4perl/Appender/Screen.pm line 57.
INFO - Reading './啊啊啊/test.bcf'
INFO - Found 1 citekeys in bib section 0
INFO - Processing section 0
INFO - Looking for bibtex file 'bibliographie.bib' for section 0
INFO - LaTeX decoding ...
INFO - Found BibTeX data source 'bibliographie.bib'
INFO - Overriding locale 'en-US' defaults 'normalization = NFD' with 'normalization = prenormalized'
INFO - Overriding locale 'en-US' defaults 'variable = shifted' with 'variable = non-ignorable'
INFO - Sorting list 'nty/global//global/global/global' of type 'entry' with template 'nty' and locale 'en-US'
INFO - No sort tailoring available for locale 'en-US'
INFO - Writing '啊啊啊/test.bbl' with encoding 'UTF-8'
INFO - Output to 啊啊啊/test.bbl
jccollins commented 3 months ago

I've been able to reproduce this. I needed to be on linux in a directory that is on an ext4 file system. It appears that when biber tries to open the .blg file, the name it uses is in NFD instead of NFC, even when the directory name is specified in the NFC form. This causes exactly the error message shown when the directory name contains an accented character.

The actual listing in the bug report of the error message ("Can't open résultats/test.blg") is in NFC, presumably because of something done by a pasting operation in a web browser.

On macOS and APFS (which is normalization insensitive, but normalization preserving), when the directory name does not contain an accented character, but the base name of the .tex file does contain an accented character, then the name of the .blg file is in NFD. In contrast, the .bbl filename is in NFC. This is given that the name of the .tex is in NFC.

The version of biber is 2.20 (in TeXLive 2024).

On combinations of OS and file systems (e.g.,macOS and APFS) that are insensitive to Unicode normalization of filenames, latexmk invoked as in the bug report does not raise an error.

plk commented 3 months ago

Looks like I forgot to NFC the filename. biber is all NFD internally and it should NFC everything on output but it looks like this was missed. Can you try biber 2.21 DEV version from SF?

BenjaminGalliot commented 3 months ago

I tried but I'm not familiar enough with Perl to be able to generate the executable to test from the sources! Sorry ! :sweat_smile:

jccollins commented 3 months ago

I tried the 2.21.beta version, and it worked, provided that the file and directory names on linux were all NFC.

But on a Unicode-normalization-sensitive system, it now fails if the names aren't NFC. That's unlikely to be the case for most users in Western Europe, since when typing in characters, typical keyboard layouts give pre-composed characters, i.e., NFC. So they will create files with NFC names. At least if the files are created within the linux

However, on macOS, suppose I have a file or directory whose name is NFC. Then I rename the file in the Finder, without even touching the non-ASCII characters. After the rename, the name is NFD! I've seen complaints about that on the web. (Korean users seem to be particularly bothered.) Command line commands (mv, etc) don't have this problem.

Luckily, at least by default, the macOS and its file systems are Unicode-normalization insensitive, so this issue doesn't seem to be too big a deal for our purposes. But transferring the files to linux could cause all kinds of interesting anomalies! It might be useful to have a little script to rename all files and directories to have a particular normalization. Perhaps one already exists.

Pdflatex, at least on TeXLive 2024, preserves Unicode normalization from the .tex filename to the names of generated files, and the same applies to latexmk. I haven't tried this with xelatex and lualatex, but I would conjecture they have the same behavior.

Would it not be better for biber to preserve the normalization of what's on the command line, since that matches better the behavior of the other programs involved? (With latexmk I went through an initial phase of thinking the internal use of NFD would be a good idea; there are recommendations that that is the "correct" thing to do. But that led to a minefield of other complications, so I abandoned that.) What problems would that changed behavior lead to?

John

On 3/29/24 11:43 AM, plk wrote:

Looks like I forgot to NFC the filename. |biber| is all NFD internally and it should NFC everything on output but it looks like this was missed. Can you try |biber| 2.21 DEV version from SF?

— Reply to this email directly, view it on GitHub https://github.com/plk/biber/issues/474#issuecomment-2027402356, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXT47OWWNSLJ2IWOEKF2PLY2V4YXAVCNFSM6AAAAABFM5IA6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXGQYDEMZVGY. You are receiving this because you commented.Message ID: @.***>

plk commented 2 months ago

Well, you have to use NFD internally because there lots of tricky things that have to be done with independent combining chars etc. I can however, have a look at preserving filenames from the form of the .bcf file.

plk commented 2 months ago

Please try 2.21 from SF again.

jccollins commented 2 months ago

On 4/13/24 2:19 PM, plk wrote:> > Well, you have to use NFD internally because there lots of tricky things that

have to be done with independent combining chars etc. I can however, have a look at preserving filenames from the form of the .tex file.

For the textual content of things like the author fields in .bib files, I agree that the internal use of NFD is suitable. That's because for ordinary text, characters that differ by normalization are intended to be equivalent.

But for filenames, things are entirely different. The combinations of Windows with NTFS and FAT32, and linux with ext4 (and IIRC FAT32) are all normalization sensitive. E.g., in these cases it's perfectly possible to have two different files whose names are identical except for Unicode normalization.

So preservation of Unicode normalization of filenames is compulsory, as far as I can see. There are effectively two different worlds of strings: Those for ordinary text and those for filenames.

Of course, if you are typing filenames in Windows and linux, you are probably going to get only NFC filenames, at least with standard keyboard layouts for many Western European languages.

But it's easily possible to get NFD filenames if you generate the files on macOS and transfer by a normalization-preserving method (e.g., in a zip file from macOS to unix). That's because GUIs in macOS coerce filenames to NFD. That doesn't matter much on macOS, since by default it is insensitive to the Unicode normalization of filenames. But once the files are on linux or Windows, there are complications.

John

jccollins commented 2 months ago

On 4/14/24 10:27 AM, plk wrote:>

Please try 2.21 from SF again.

Sorry, but it doesn't work. I see at least the following problems

  1. When I run this version of biber with a bcf file named NFC-café.bcf (with NFC coding), the bbl file has the name NFC-café.bbl. I conjecture that what has happened is that Perl's encode subroutine was applied to a string that was already UTF-8 encoded.

I can reproduce this kind of situation in a Perl script if I do

      use utf8;
      my $orig = 'NFCé';
      my $enc1 = encode( 'UTF-8', $orig );
      my $enc2 = encode( 'UTF-8', $enc1 );

The string $enc2 has content that is the UTF-8 encoding of 'NFCé'. The string $enc1 has the correct UTF-8 encoding of the original string.

  1. The same error occurs in the blg file for the strings for the names of .bbl file. I've attached a zip file containing an example.

  2. If the OS is linux, and the bcf file is in a directory named résultats, this version of biber, just like 2.20, still tries to write a .blg file whose name is the NFD version of what it should be writing. That gives a fatal error, since there is no directory whose name is the NFD version of 'résultats'.

John

--------------iZI3vq0Dv8G5a1I7JF6nqX00 Content-Type: application/zip; name="biber-issue.zip" Content-Disposition: attachment; filename="biber-issue.zip" Content-Transfer-Encoding: base64

UEsDBBQAAAAIAM9YklhR9cpGEAIAAFoEAAAOABwATkZDLWNhZmXMgS5ibGdVVAkAA/Y2IWZgOCFm dXgLAAEECQIAAAQUAAAArVNNjpswFN7nFG9HKw0EEwIkEpXaVJEqjZKqk6iVoiwMMcQaByNj0mZ2 VU/Ta3S2PVRtl0Cgs6yEBMbv+3mfn3fuHha8yGjulKf5BPlv4MNquQYbNkdagXre0YQI8BwPwauE SPx6tBtgwhZzz/OMMqJh1mq5sFOckd/fnYTl1mjnRXtINJmtyRzNZZ+9+cTvNOM4Vro1vC0FoOgO PNfz7wChuRvMJ+5oh9zZ/q8hrex7bov8RPCBFjlYzrgRfv7pJGmmdJE/uQHNZp3cktfFARCkVJJH clHtFtohVCSVlBegFafuraIfTFr0R8FTUlVatQeY3gKCqX+TDn/U1RkXWkaSb2DSstSCUZ4LXB4p cdTKMjU91mDfYDQtClHn4x5vyBc4kJSbBBzH0fVRr36KZoOulUUDwxJDxWuRvuhDMYXeHrYLzhiW RHMFUcu0PhMhqFFlPMW6FVLY2wdLuclwzaQagzMWFCdqK4bqSDNJDhZ8pfLY2yl4YdO84EKv/4Oo YjphRp+wyS+G1fL9VXW4VQpy/aWsNdI3IxZ1QT9wIY0sraQikpdxzniC2fj67r0s4BnIS2n8SXFp DEhyKnVXhsACrI6i38YLFrqzW3F1WEKCxJRxoc3gs/o0MeqR+ZdKz03CTIZh0PJ8FtS0cr2lzz9+ qeuSsMYkKZphsrabpR0NiMKwM7SuZVkrPxwGTKM/UEsBAh4DFAAAAAgAz1iSWFH1ykYQAgAAWgQA AA4AGAAAAAAAAQAAAKSBAAAAAE5GQy1jYWZlzIEuYmxnVVQFAAP2NiFmdXgLAAEECQIAAAQUAAAA UEsFBgAAAAABAAEAVAAAAFgCAAAAAA==

--------------iZI3vq0Dv8G5a1I7JF6nqX00--

plk commented 2 months ago

I will have a look - I suspect that the log4perl module is doing some normalisation which isn't obvious as it's that which creates the .blg.

plk commented 2 months ago

Can you please try 2.21 dev again from SF?