maxicarlos08 / gphoto2-rs

Rust wrapper for gphoto2
GNU Lesser General Public License v2.1
29 stars 6 forks source link

Investigate encodings and localisation #39

Open RReverser opened 1 year ago

RReverser commented 1 year ago

I've been looking a bit into how libgphoto2 deals with encodings, and found that it actually respects system localisation + some undocumented APIs for overriding localisation/encodings. What worries me there is that I'm not certain that non-English strings are guaranteed to be encoded in UTF-8 by libgphoto2, in which case perhaps we don't have a right to return them as String, or perhaps we at least have to customize encoding upon startup to enforce UTF-8 or something... As I said, I only started looking at this and there's a bit of uncertainty here. Now that I added test framework, I think it would be helpful to try and verify those assumptions and check what happens with non-ASCII characters.

Originally posted by @RReverser in https://github.com/maxicarlos08/gphoto2-rs/issues/27#issuecomment-1262418828

maxicarlos08 commented 1 year ago

I don't have much C experience (and less with the locales, only heard bad things about it), but isn't there a function like setlocale we can use to just set the locale to C to get consistent results regardless of the operating's system locale?

Sorry if what I'm saying makes no sense at all :/

maxicarlos08 commented 1 year ago

Wait nevermind, libgphoto2 already does that: https://github.com/gphoto/libgphoto2/blob/master/ChangeLog#L2249

RReverser commented 1 year ago

Yeah it does, as well as it has an internal function for overriding the encoding specifically. All I'm saying is that it's worth testing and adding whatever's necessary to make sure that Unicode strings from camera are correctly parsed and converted to UTF-8 that Rust expects regardless of LC_ALL=... environment variable or any other settings (I suspect right now if user sets LC_ALL to a non-UTF8 encoding, and there are non-ASCII strings, we'll get either corruption or runtime panics).

RReverser commented 1 year ago

If you have camera at hand and it allows settings some strings (e.g. Canon allows to set copyright author name via Digital Photo Professional), it would be good to set that to e.g. Chinese characters, overriding LC_ALL to some non-UTF8 encoding and checking how those characters are parsed by gphoto2-rs.

maxicarlos08 commented 1 year ago

I added an example to test this, unfortunately my camera (a Nikon D3400) does not seem to have any text that I can freely set.

If you have a camera where you can set a text, just adjust the CONFIG_KEY in the example to the widget name.

RReverser commented 1 year ago

I added an example to test this, unfortunately my camera (a Nikon D3400) does not seem to have any text that I can freely set.

It should have the same copyright field (https://www.dummies.com/article/home-auto-hobbies/photography/add-copyright-notice-images-nikon-d5600-245525/), but seems like it can be set only from the camera, which likely means no Unicode chars.

RReverser commented 1 year ago

Actually, one other thing you could do is just create a file with unicode chars via file manager when camera is connected, and then checking how it's reported via CameraFS?

maxicarlos08 commented 1 year ago

Actually, one other thing you could do is just create a file with unicode chars via file manager when camera is connected, and then checking how it's reported via CameraFS?

My camera sadly only exposes images to libgphoto2 it has taken and match the naming format :(

I suspect right now if user sets LC_ALL to a non-UTF8 encoding, and there are non-ASCII strings, we'll get either corruption or runtime panics

Will we? AFAICT we are using String::from_utf8_lossy wherever we get a string from libgphoto2, so invalid UTF-8 should not cause a pash

RReverser commented 1 year ago

so invalid UTF-8 should not cause a pash

Those I count as corruption :P

My camera sadly only exposes images to libgphoto2 it has taken and match the naming format :(

Hm, are you saying that manually added files via file manager are not visible via CameraFS? Fascinating.

RReverser commented 1 year ago

Anyway, I'll add some unicode chars to the virtual camera to make this easier to verify.

maxicarlos08 commented 1 year ago

Anyway, I'll add some unicode chars to the virtual camera to make this easier to verify.

WDYM? I haven't found any writable TextWidget on the Virtual camera. You don't seem to be able to create or upload files/directories on the virtual camera (the storage access type is RoDelete)

RReverser commented 1 year ago

I mean in libgphoto2 itself :)