sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.87k stars 130 forks source link

Wrong detection of a json character encoding? #463

Closed ghost closed 3 years ago

ghost commented 3 years ago

I'm trying to deduplicate files remotely. I have run rmlint on machine A, and obtained a json file.

I uploaded it to machine B, and tried to run: #LC_ALL=en_US.utf8 /root/rmlint-2.10.1/rmlint --replay /mnt/SEAGATE/ // 2021-01-04_pi-rmlint-utf8.json

I am getting an error:

INFO: Loading json-results `/root/2020-12-18_sorting_hdd-running-rmlint-seagate-vs-wd/pi/2021-01-04_pi-rmlint-utf8.json'
WARNING: Error: JSON data must be UTF-8 encoded
WARNING: Loading /root/2020-12-18_sorting_hdd-running-rmlint-seagate-vs-wd/pi/2021-01-04_pi-rmlint-utf8.json failed.
ERROR: No valid .json files given, aborting.

The json file is produced by rmlint itself, and under a utf8 locale. chardetect from python-chardet agrees that the file is utf8.

The difference between the machines is that machine A is a raspberry pi 4B, and machine B is an x86_64. rmlint versions are both 2.10.4

SeeSpotRun commented 3 years ago

Looks like same issue as #464

SeeSpotRun commented 3 years ago

@lockywolf are you able to confirm if https://github.com/SeeSpotRun/rmlint/tree/glib-json fixes this?

ghost commented 3 years ago

I'll try to do that, although I have already crunched through the stockpile of files, and I don't have the exact test conditions.

ghost commented 3 years ago

Sorry, I cannot reproduce it on my machine any more. I must have deleted the files in question manually, or something like that.

Shall I close? If the issue reappears, somebody (maybe me) would reopen it.

SeeSpotRun commented 3 years ago

Yes let's assume the PR fixes it until someone finds otherwise