hvdwolf / jExifToolGUI

jExifToolGUI is a multi-platform java/Swing graphical frontend for the excellent command-line ExifTool application by Phil Harvey
https://hvdwolf.github.io/jExifToolGUI/
GNU General Public License v3.0
466 stars 39 forks source link

Displaying Russian letters (encoding problems) #152

Closed KovalevArtem closed 3 years ago

KovalevArtem commented 3 years ago

exif: image Russian letters look like this: ����

xmp: image Russian letters look like this: ????

KovalevArtem commented 3 years ago

image If you select Russian in the application settings, then the names and original values are displayed correctly. image

hvdwolf commented 3 years ago

This looks like some UTF-8 encoding issue. I will investigate it.

hvdwolf commented 3 years ago

I'm a bit puzzled here. This was an issue before the first release which I fixed at that time. Everything works in UTF-8. Can you please share some of your images. I assume they have Russian text strings in them as well. I want to check whether the data in the image strings is stored as utf-8 or stored as CP 866 (Cyrillic) or something else.

hvdwolf commented 3 years ago

When I do an exiftool 131907609-b612f89f-6b8c-4cff-80bb-65a86ac10d63.jpg I see among others:

Software                        : paint.net 4.2.16
Artist                          : ���� 1
Host Computer                   : iPhone X
Copyright                       : ���� 2

and

Creator                         : ???? 1
Description                     : Test Тест 6
Rights                          : ???? 3
Title                           : Test Тест 6

and

Credit                          : ???? 2
Label                           : ???? 5

I think paint.net is not writing utf-8 encoded data to your images. If you take a "clean" image and write the same tags with JTG, then what do you see?

mrtngrsbch commented 3 years ago

https://www.getpaint.net/ ??? sure the problem comes from that software written in .NET please, use Open Source editors

KovalevArtem commented 3 years ago

https://user-images.githubusercontent.com/36500228/131996207-c097cbb9-de49-4565-82c1-cccabdd60a90.mp4

Moscow_Original.jpg ![Moscow_Original](https://user-images.githubusercontent.com/36500228/131993117-f4910ebc-0059-454d-ba0a-0eee483dd1b2.jpg)
Moscow_Edited.jpg (Modified exif data using jExifToolGUI) ![Moscow_Edited](https://user-images.githubusercontent.com/36500228/131993218-d929eead-6215-4737-8619-e4b0e6f91306.jpg)
hvdwolf commented 3 years ago

So it is JTG. I will look further.

hvdwolf commented 3 years ago

I uploaded a windows 20210903 build. Please try.

https://mega.nz/folder/UFlRhYCZ#LITpkOKT79CNWtmdwCG0bw

KovalevArtem commented 3 years ago

Everything is still image

hvdwolf commented 3 years ago

Please type in a command box (dosbox) the following command chcp and let me know what it returns

I am wondering whether it is: 855 | Cyrillic (Russian) 866 | Russian 65001 | UTF-8 or maybe something completely different.

KovalevArtem commented 3 years ago

866

hvdwolf commented 3 years ago

\<frustration start>Stupid Windows. Only system in the world that does not use UTF-8.\<frustration end>

This requires code adaptations and a possible setting. Somewhere on startup I should check on Windows platform which code page is being used and use that one for reading and writing. Exiftool does support this as Windows requires it.

For a user only looking at his/her own data, it doesn't make a difference, but even uploading to photo sites (Flickr, Piwigio, GPhotos) etcetera might already break this as these all run on Unix platforms. Of course this makes exchangeability very limited worldwide. I know that Greg and Martin want to use ISAD(G) and VRA-core world-wide, but with windows users all using their own code page, this interoperatability will be seriously hindered.

As microsoft understands that this limits international use, they now support applications to use (force) use of UTF-8 codepage since build 1907. So the application should auto use the system default, unless the user specifies another code page (and in that case utf-8 should be the prefered default to enhance worldwide operatability)

This will take some time to implement I'm afraid.

KovalevArtem commented 3 years ago

If anything, then I have this version: Windows 10 Pro 21H1 (19043.1165)

Thank you for your work, in solving this problem and in general developing this program in general!

mrtngrsbch commented 3 years ago

I really didn't expect such an answer. WTF, windows doesn't use UTF-8 ?

hvdwolf commented 3 years ago

For some more info on how exiftool deals with this, please read answer 10 and 18 of the faq: https://exiftool.org/faq.html#Q10 and https://exiftool.org/faq.html#Q18

In this case I also have to deal with java as UI app, and exiftool, the console app, that I call from java.

hvdwolf commented 3 years ago

And another windows 20210904 build now using the "-use mwg" feature. My hopes are low, but this is a quick fix. Otherwise I have to do the heavy lifting to get it fixed. https://mega.nz/folder/UFlRhYCZ#LITpkOKT79CNWtmdwCG0bw

Edit: This will not work on existing images (I think) as the strings are already saved with the Russian codepage, but should work on new images

hvdwolf commented 3 years ago

This is for newly written tags?

KovalevArtem commented 3 years ago

Unfortunately, the miracle did not happen 😔 image I used the original image, no meanings with Russian letters ...

hvdwolf commented 3 years ago

A totally different approach: Reading and writing with default system codepage instead of trying to read and write in utf-8. On my linux machines that is utf-8 anyway. Same for macOS.

build jExifToolGUI-1.9.0.0_beta-20210904_2-win-x86_64_with-jre.zip https://mega.nz/folder/UFlRhYCZ#LITpkOKT79CNWtmdwCG0bw

KovalevArtem commented 3 years ago

build jExifToolGUI-1.9.0.0_beta-20210904_2-win-x86_64_with-jre.zip

Hooray, it works!🎉 image

Everything works correctly EVEN with files edited by previous builds...

hvdwolf commented 3 years ago

Good. Thanks for the feedback. I will still try another way, where I "translate" back and forth between utf-8 for storage to enhance world-wide interoperatablity, and displaying in the default codepage. But that will be in a few days.

KovalevArtem commented 3 years ago

build jExifToolGUI-1.9.0.0_beta-20210904_2-win-x86_64_with-jre.zip

Found another issue in this build.

https://user-images.githubusercontent.com/36500228/132302789-e427a2dd-5516-46f8-8fb0-8f6da1873e78.mp4

The image is attached (it has not been processed) ...

Image ![IMG_20210907_102041](https://user-images.githubusercontent.com/36500228/132302950-488632dc-6513-426d-8d71-e03e3adf4263.jpg)

There is no such problem in 20210904 build...

hvdwolf commented 3 years ago

I am not surprised. And that also remembers me why I alsways read UTF-8. All ExifTool internal strings are utf-8 (of course), so if I read tag strings and values, I did read those as utf8 meaning that the translated strings from Exiftool itself were correctly displayed. When using the operating systems default codepage, those values might be read correctly but not exiftools internal strings. I really would not know how to solve this. Windows started to support unicode under Win 7 but is still not unicode by default. I will need to look further. I can of course create a setting to use utf8 (default) or windows default codepage, but then the users on windows from other code pages would run into these issues. I did read/write in utf-8. I just fouind some code that translates every string to utf-8 after reading and before writing. That might be an extra "if windows then.." method. I will try this weekend. After all I can test myself using Gernam, French or Spanish words (genießen, Vergnügen, goûter, façade, ¿Abrir) or copy some from the Russian translation

mrtngrsbch commented 3 years ago

In Spanish use words with all accents [á, é, í, ó, ú, ñ].

pirámide teléfono legítimo brócoli Cancún España

hvdwolf commented 3 years ago

And it's so nice to see that Windows displays utf-8 by default in their browsers and internet IIS servers. They must of course otherwise they had a world-wide issue and nobody would use Microsoft anymore. Why not be consequent and use it in your entire OS. \<I will stop my frustration here :wink:>

hvdwolf commented 3 years ago

I already explained earlier, that exiftool is prepared for windows, but just to show you I did below. Using your Moscow_edited.jpg, I get the following on my utf-8 linux box using straightforward exiftool.

exiftool -exif:all ~/mnt/PUBLIC/kovalevArtem/Moscow_modified.jpg 
Software                        : Picasa
Artist                          : ����
Copyright                       : ����
Exif Version                    : 0220

When using the correct characterset (and note that characterset is different from codepage 866/851. Let's not make it too easy :wink:):

exiftool -exif:all -charset exif=cp1251 ~/mnt/PUBLIC/kovalevArtem/Moscow_modified.jpg 
Software                        : Picasa
Artist                          : Тест
Copyright                       : Тест
Exif Version                    : 0220

and when using the correct characterset and language

exiftool -lang ru -exif:all -charset exif=cp1251 ~/mnt/PUBLIC/kovalevArtem/Moscow_modified.jpg
Имя и версия ПО                 : Picasa
Исполнитель                     : Тест
Владелец копирайта              : Тест
Exif версия                     : 0220

Of course that works differently in java coding handing it over to and getting it back from a commandline tool, but I hope I get that to work for both reading and writing as I found some coding that "should" do that (of course I am not the first running into this multi-platform issue)

Your 3rd image is just a black "empty" jpg.

hvdwolf commented 3 years ago

It is getting worse. The camera has stored the values as UTF-8. The program has stored the vales in the default codepage.

So in post 7 you state for the exif values in the OS codepage that it works, but please check for that "jExifToolGUI-1.9.0.0_beta-20210904_2-win-x86_64_with-jre.zip" also the File tags/values I see for example for the File tags (written by the camera): File | Exif – Порядок байтов | Порядок от младшего к старшему (Intel, II) File | Процесс кодирования | Базовое DCT, кодирование Хаффмана Same for camera tags. They are all in UTF-8. When using that system code page as done in the "jExifToolGUI-1.9.0.0_beta-20210904_2-win-x86_64_with-jre.zip", those File tag values display incorrect. Your added exif values are displayed as ���� in UTF-8 and as Тест in windows codepage.

The point is that the "western" codepages miss some of the most exotic characters but all cover the "standard" strange characters in western latin languages. Of course Cyrillic and Asian languages are completely different. I guess that's why hardly anyone noticed so far.

Edit: So I have unicode (utf-8) tag strings from exiftool. I have unicode value strings from the camera, and I have Windows code page strings from the "editor".

Even if I now start to "force write" everything in utf-8 (get string in OS code page, convert to byte array, convert to utf8 string, write to file), you will still get issues with older editied files, or files from programs that did write in the OS codepage.

hvdwolf commented 3 years ago

I am afraid that I will never get this to work. When calling exiftool, a new shell (cmd/dosbox on windows) is automatically created which always has the default codepage. I tried using a cmd file which first sets the codepage to unicode (65001) and then calls exiftool with the commands. That still doesn't help as first the console is opened in the standard codepage with utf8 code strings malformed before handing them over to the cmd file. I could also request the user to set this cp65001 on machine level in the registry, but I don't think the users will like this as all other codepaged based console apps will not display correct anymore. And then it turns out hat that even on windows 10 there is still an issue on "outencoding" and "inencoding" with unicode as occasionally it is still falling back to US ASCII-7. Setting jExifToolGUI to use the default codepage, makes reading/writing of tags work but will restrict it to that codepage, making worldwide exchangability difficult unless everyone decides to use pure ansi standards (again: stupid windows). Most users will not care at all as they only deal with their own images. But using the default codepage makes that the internal exiftool strings for displaying tag strings in your own language (russian, korean, turkish, chinese, etc.) don't display correctly as those strings are in utf8.

I really don't know how to solve this with exiftool. :disappointed: In java there is the ImageIO library. Next to that you have the TwelveMonkeys image library which is really java based and extends the number of image types and functionality, and there is also the Apache Commons Imaging pure java library. I already used the TwelveMonkeys minimally before as it also extends some functionally which was completely missing in the java 8 JDK, which I now abandone. I guess they all overcome the windows codepage issue as you start your windows java program in unicode (that is possible and I am already doing that), and as everything stays in java it remains unicode. But it would mean a complete rewrite of the program, and all libraries have less functionality than exiftool 😒 , although that currently doesn't make a difference for the exif/xmp/gps/etc. data I am writing. It would also be a mixed bag as renaming and geotagging is not supported and therefore must stay with exiftool. And perhaps the same for the "ExifTool Commands" tab as that might be the place where users want to write the "exotic" tags. And I would not have a clue on how to embed the ISADG(G) and VRA-core functionality in those libraries. That is the pure strenght of exiftool. Also the current support for tag strings in your own language is not supported (as far as I can see). And they are all a lot less simple to implement, which is another strength of exiftool. This is really demotivating. :disappointed:

I think I might write a PM to Phil and ask him if I overlook something (hopefully)

KovalevArtem commented 3 years ago

File | Exif – Порядок байтов | Порядок от младшего к старшему (Intel, II)

image File Exif – Порядок байтов Порядок от старшего к младшему (Motorola, MM)

File | Процесс кодирования | Базовое DCT, кодирование Хаффмана

image File Процесс кодирования Базовое DCT, кодирование Хаффмана

KovalevArtem commented 3 years ago

Is it possible to do separate processing for tags and values?

Снимок экрана 2021-09-12 203029
hvdwolf commented 3 years ago

Unfortunately that is not possible. You simply give a command to exiftool and that delivers a big chunk of textual data (in this case in "tab" format), which I split on the tab. So I did try to treat the first part of the string as utf-8 data, and the second as non-utf (in case of windows). That simply doesn't work as the data is completely delivered either in the code page, giving malformed tag strings, or totally as utf-8, which gives malformed data strings.

KovalevArtem commented 3 years ago

The other day I thought and searched a lot - other applications that use exif information and that are programmed in Java. The RouteConverter application fell under this criterion.

But it has the same problem:

Program interface in Russian ![image](https://user-images.githubusercontent.com/36500228/133127249-57f6b929-bb7f-416d-ba91-38619c6d303b.png)
Program interface in English ![image](https://user-images.githubusercontent.com/36500228/133127279-a6b09b62-11e8-4b7e-a72e-024ab3627316.png)

One can try to find how JOSM works with exif information, because this is also a project, mostly programmed in Java...

hvdwolf commented 3 years ago

I also checked FastFotoTagger, but that has the same issues (actually even more)

hvdwolf commented 3 years ago

I think I finally made it work. Please check the jExifToolGUI-1.9.0.0-beta-20210923-win-x86_64_with-jre.zip on megaNZ.

See also https://exiftool.org/forum/index.php?topic=12864.0

This will work for future images, but still not for older images written with the original windows codepage.

KovalevArtem commented 3 years ago

jExifToolGUI-1.9.0.0-beta-20210923

https://user-images.githubusercontent.com/36500228/134772701-ac6246c8-cfdf-4b68-a1ec-b4fbcff9363d.mp4

hvdwolf commented 3 years ago

I can't reproduce this. Please open JTG, go to Preferences, System (3rd tab) and set log level to trace. Restart JTG and please try again. Close JTG and please share the log with me (next to the exe is a "logs" folder) And better set the log level back to info or so.

KovalevArtem commented 3 years ago

In build

jExifToolGUI-1.9.0.0-beta-20210923-win-x86_64_with-jre.zip

the problem is really solved.

https://user-images.githubusercontent.com/36500228/136101811-73471a12-ae9d-4c6f-883c-63c77cea302d.mp4

(The problem that I wrote about a little above was resolved by itself ...)

hvdwolf commented 3 years ago

Great! I will now work towards a new release. I had more things on my ToDo-list, but currently (since June) my daily work takes so much (over)time, that I don't have the energy or motivation to implement more new stuff.