sylikc / pyexiftool

PyExifTool (active PyPI project) - A Python library to communicate with an instance of Phil Harvey's ExifTool command-line application. Runs one process with special -stay_open flag, and pipes data to/from. Much more efficient than running a subprocess for each command!
Other
161 stars 21 forks source link

Little problem with files with non ascii characters #29

Closed manuel1957 closed 2 years ago

manuel1957 commented 3 years ago

Hello, when I try to read a file in a directory with spaces on the name and or non ascii characters, it doesn't work.

Manuel

manuel1957 commented 3 years ago

correction the problem is only with non ascii characters example : anafi/données .

sylikc commented 3 years ago

I'm looking into this now. I can reproduce the error with that filename you specified.

What locale is your system set at?

sylikc commented 3 years ago

@jangop Jan, I have a question for you... I'm having encodings hell, and it just doesn't seem to ever go away.

So the code as it was originally written was Python 2. I can see since Python 3.6ish, there's been a new argument encoding added to Popen(). Is there any drawback in moving away from this bytes thing in ExifTool (or any class in general)?

I'm looking at this bug and I'm tempted to try to change the project to using str instead of encoding to and from bytes... but I don't understand enough about encodings and stuff to know what the impact is.

I wrote test code to test this bug, and I'm seeing

2021-10-03 12:20:44,123 Method 'run': Exiftool version '12.30' (pid 9472) launched with args '['D:\\bin\\console\\exiftool.exe', '-stay_open', 'True', '-@', '-', '-common_args', '-G', '-n']'
2021-10-03 12:20:44,123 Method 'execute': Command sent = [b'-j', b'donn\xc3\xa9es . . . .CR3', b'-echo4', b'=${status}=post795713', b'-execute795713']
2021-10-03 12:20:44,142 Method 'execute': Reply stdout = b''
2021-10-03 12:20:44,142 Method 'execute': Reply stderr = b'Error: File not found - donn\xc3\xa9es . . . .CR3\r\n'
2021-10-03 12:20:44,143 Method 'execute': Reply status = 1
2021-10-03 12:20:44,156 Method 'terminate': Exiftool terminated successfully.

It almost looks like ExifTool class is passing \xc3 ... directly to the underlying process, and it confuses me

All I did in the test code was to pass in

a.execute_json("données . . . .CR3")

I don't see any code that would turn the above into that hex escaped string . . .

manuel1957 commented 3 years ago

Hi

my locale is ('fr_FR', 'cp1252')

Thanks for your help.

Le 03/10/2021 à 21:26, Kevin M a écrit :

I'm looking into this now. I can reproduce the error with that filename you specified.

What locale is your system set at?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sylikc/pyexiftool/issues/29#issuecomment-933011089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ7XQNFFJPARNTGE4GVUY2DUFCU7BANCNFSM5EVZNCXA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jangop commented 3 years ago

I don't see any code that would turn the above into that hex escaped string . . .

execute_json applies os.fsencode to all parameters. Assuming that works similar to str.encode (using the operating systems's default encoding), that should do exactly what you describe.

jangop commented 3 years ago

Is there any drawback in moving away from this bytes thing in ExifTool (or any class in general)?

I do not see any immediate drawbacks. Actually, none at all. Although I do think that pathlib.Path would be clearer and, as such, superior.

sylikc commented 3 years ago

Referring to another commit I commented on awhile back that also had to do with encodings. Coincidentally, also a French encoding. But I need to look into this a bit more... https://github.com/mchaptel/pyexiftool/commit/31784fc29ea10e13bc28de4808e89250512e6329

jangop commented 3 years ago

Alright, let us come up with a few reproducible tests that use fancy characters in filenames and meta data.

sylikc commented 3 years ago

I'm going to use the v0.5.x branch to see if changing the encoding on Popen() will fix this problem.

sylikc commented 3 years ago

Ok, I think I made progress... assuming a UTF-8 encoding is a fallacy, and now I put a constructor with an "encoding" property, which defaults to locale.getpreferredencoding(False). I'm going to clean it up and push the commit.

I don't think I'll fix this in the v0.4.x branch... the functionality is all Python 3 specific

sylikc commented 2 years ago

@manuel1957 I should be making the Python3 refactor the new version soon (sometime this year), it is fixed in that branch. Would you like to test the latest on that branch to make sure it works for your use cases?

https://github.com/sylikc/pyexiftool/tree/v0.5.x-py3-refactor

sylikc commented 2 years ago

pushed. This issue should be fixed