Open catharsis71 opened 2 years ago
Ouch! Sorry about your data loss, and thanks for such a detailed report. A first look at the code leaves me baffled, as the only call to chmod()
is made without caring about whether it succeeds (it's done on a "best effort" basis), so I can't work out how that error message is being emitted. I will investigate further.
Please could you confirm what version of recode you're using?
I'm using the Ubuntu package which seems to be 3.6-24... am I in the wrong place? I didn't look too closely at the package info yesterday so I might be in the wrong place.
Package: recode
Version: 3.6-24
Priority: optional
Section: text
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Santiago Vila <sanvila@debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 209 kB
Depends: libc6 (>= 2.4), librecode0 (>= 3.6)
Download-Size: 111 kB
The --version on mine doesn't seem like it's been updated in quite a while
$ recode --version
Free recode 3.6
Written by Franc,ois Pinard <pinard@iro.umontreal.ca>.
Copyright (C) 1990, 92, 93, 94, 96, 97, 99 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I would certainly appreciate a report against the latest 3.7.x. I am working with Debian to get 3.7 packaged, but it's taking a while!
Okay on 3.7.12 it seems to work properly on NTFS but still fails in the same way on exFAT, albeit with a different error message, same end result though -- original file gone, .tmp file retained, and no easy way to figure out which .tmp file came from which file if you ran it on a lot of files.
exFAT filesystem:
:/mnt/d$ recode --version
recode 3.7.12
Written by François Pinard <pinard@iro.umontreal.ca>.
Copyright (C) 1990-2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
:/mnt/d$ echo HELLO > testing.txt
:/mnt/d$ file testing.txt
testing.txt: ASCII text
:/mnt/d$ uchardet testing.txt
ASCII
:/mnt/d$ recode utf8..utf16 testing.txt
/home/username/bin/.libs/recode: rename (/mnt/d/recode-ALN8Z9.tmp, /mnt/d/testing.txt): No such file or directory
:/mnt/d$ ls -l testing.txt
ls: cannot access 'testing.txt': No such file or directory
:/mnt/d$ ls -l *.tmp
-rwxrwxrwx 1 root root 60 Mar 28 09:02 rec10721.tmp
-rwxrwxrwx 1 root root 47307 Mar 28 01:27 rec5097.tmp
-rwxrwxrwx 1 root root 14 Mar 28 09:52 recode-ALN8Z9.tmp
:/mnt/d$ file recode-ALN8Z9.tmp
recode-ALN8Z9.tmp: Big-endian UTF-16 Unicode text
:/mnt/d$ uchardet recode-ALN8Z9.tmp
UTF-16
:/mnt/d$
NTFS filesystem:
:/mnt/g$
:/mnt/g$ echo HELLO > testing.txt
:/mnt/g$ file testing.txt
testing.txt: ASCII text
:/mnt/g$ uchardet testing.txt
ASCII
:/mnt/g$ recode utf8..utf16 testing.txt
:/mnt/g$ file testing.txt
testing.txt: Big-endian UTF-16 Unicode text
:/mnt/g$ uchardet testing.txt
UTF-16
:/mnt/g$
I saw there was a --verbose option so I gave it a try on exFAT but it didn't provide a whole lot of info
Request: UTF-8..:iconv:..UTF-16
Shrunk to: UTF-8..UTF-16
Request: UTF-8..ISO-10646-UCS-4..UTF-16
Recoding /mnt/d/test2.txt... done
/home/username/bin/.libs/recode: rename (/mnt/d/recode-Li08fA.tmp, /mnt/d/test2.txt): No such file or directory
This could possibly be a WSL bug or incompatibility. I should probably file a bug with WSL but it'd be useful to know if this happens on a pure Linux system to or only with WSL. Unfortunately I'm not able to test on pure Linux currently.
Just throwing out ideas, maybe the temp file name could just be the real filename with .tmp appended to it? That way if something goes wrong and you end up with a bunch of .tmp files you at least know what the names are supposed to be.
Doing further testing... it looks like 3.7.12 is broken on FAT32 filesystems as well, even though the older version worked properly on FAT32...
:/mnt/e$ echo HELLO > temp.txt
:/mnt/e$ recode --verbose utf8..utf16 temp.txt
Request: UTF-8..:iconv:..UTF-16
Shrunk to: UTF-8..UTF-16
Request: UTF-8..ISO-10646-UCS-4..UTF-16
Recoding /mnt/e/temp.txt... done
/home/username/bin/.libs/recode: rename (/mnt/e/recode-hAtOC2.tmp, /mnt/e/temp.txt): No such file or directory
:/mnt/e$ ls -l temp.txt
ls: cannot access 'temp.txt': No such file or directory
:/mnt/e$ ls -l *.tmp
-rwxrwxrwx 1 cmcphers cmcphers 14 Mar 28 10:30 recode-TIBDLW.tmp
-rwxrwxrwx 1 cmcphers cmcphers 14 Mar 28 10:32 recode-hAtOC2.tmp
:/mnt/e$ file (/mnt/e/recode-hAtOC2.tmp,
-bash: syntax error near unexpected token `/mnt/e/recode-hAtOC2.tmp,'
:/mnt/e$ file /mnt/e/recode-hAtOC2.tmp
/mnt/e/recode-hAtOC2.tmp: Big-endian UTF-16 Unicode text
:/mnt/e$ uchardet /mnt/e/recode-hAtOC2.tmp
UTF-16
:/mnt/e$
So in summary:
3.6-24 -- works on FAT32 but broken on exFAT and NTFS
3.7.12 -- works on NTFS but broken on exFAT and FAT32
No issues on my native WSL filesystem which I think is ext4, but my space there is limited so I basically have to use my mounted Windows drives for a lot of stuff.
Thanks for your further investigation.
If I try your example on a FAT32 filing system attached to my Ubuntu machine, it works fine, so the filing system doesn't appear to matter.
As you've observed, it's the rename()
system call that is failing, so the data is not (fortunately!) being lost in this case. However, I agree it would be no fun trying to recover it from the temporary files. The No such file or directory
error suggests that the rename()
routine, at least, is having trouble with the filename. I assume that /mnt/e
is really /home
?
At first blush it looks as though for some reason rename()
doesn't understand the filename while the other routines that open the file etc. understand it just fine. I'm afraid I don't know much about WSL, so I have no idea why this would be.
gnulib
has a rename()
wrapper that recode
is not currently using, but it doesn't have any code in it that references WSL. Some of the rename
tests do mention WSL, but none of the cases tested seem to relate to this one.
I filed a bug on WSL: https://github.com/microsoft/WSL/issues/8201
The error message on 3.7 is definitely more useful than what I was initially encountering on 3.6, because it does actually show the original & temporary filename together. So on 3.7 at least, if the program output hasn't been lost, renaming the files back manually isn't a huge deal. The files I converted yesterday on 3.6 though are probably a lost cause
Sorry I can't help more, and I hope someone at MS or with better knowledge of WSL can work out what's going wrong here. There may well be a fix or workaround even if it's not a recode bug.
I am running Ubuntu WSL
I was running the following to find files with Windows-1252 encoding and convert them to UTF-8:
Unfortunately it seems that recode does not work properly with NTFS filesystems at all
I ended up with hundreds of these messages:
All of the original files (and all filename information) are gone
The .tmp files are in fact UTF-8 but there's no way to know what the original filenames were so the files are effectively gone/useless
even if I had the original filenames there's no way to know which .tmp file correlates with which original filename
it's not uncommon for Linux programs to not work perfectly on NTFS but I've never encountered anything this bad before
I "lost" nearly 400 files and it would have been more if I hadn't noticed the errors and aborted the job
Here's an example using a single file:
With a single file it's not a big deal to rename the .tmp file back to the original filename (as long as you have the original filename) but when many files are affected it seems impossible to recover from, especially if you don't have the original filenames.
I verified the same thing happens even without the
-t
I verified that this happens on both NTFS and exFAT but does NOT happen on FAT32
this issue might or might not be specific to WSL systems; a pure Linux system with an NTFS or exFAT filesystem mounted might or might not behave differently; I'm unable to test this