BOM and encodings in translation files

moltenform / scite-files

Translations and extras for the SciTE code editor --- see the links below for more information!

123 stars 44 forks source link

BOM and encodings in translation files #39

Closed scootergrisen closed 6 years ago

scootergrisen commented 6 years ago

I notice some of the translation files like locale.da.properties had BOM at beginning of the file. I dont know if this is a problem. Maybe there can be added comment in the translation files about using BOM or not.

It seems the translation files are using lots of different encodings. ANSI, UTF-8 without BOM, UTF-8 with BOM, Windows-1251 and others. Would it be possible to have all translations use UTF-8 without BOM like i see in other projects?

moltenform commented 6 years ago

My changes have added the BOM, but as it has no effect on SciTE's display of the translation, I haven't bothered to remove it. Do you see a potential issue with the BOM present?

To me, it's more important that the translation.encoding line is correct, and that there are no files with mixed encodings. Those are the changes that I thoroughly went through every translation to fix.

Yes, I'm planning on converting everything to UTF-8. It's just a legacy of having a project that is almost 20 years old. Before Unicode was comfortably supported, Neil encouraged single-byte encodings. I'm also aware that some of the translations are messy/have out-of-order items, but I don't know if the value added would be worth the time to correct.

scootergrisen commented 6 years ago

I just think its nice to know what encoding to use and if to use BOM or not so people can know instead of guessing. I had BOM in PHP files at one time and PHP did not support it and since BOM was not visible in the text editor i used that was confusing. I would like it better if the translation file i submit is not manipulated so that the backup i have is identical.

moltenform commented 6 years ago

Addressed the suggestions in https://github.com/downpoured/scite-files/commit/670d76b105c648c47bfbdc340609d4bdf9dc9e21

converted all encodings to UTF-8 and no BOM (except China/GBK)
changed filenames in the first line of the file to match pre-installed filename
recommend UTF-8 for new translations

moltenform commented 6 years ago

I wrote a patch to SciTE's locale.properties so that it matches the locale.properties here, including recommending UTF-8, and it was just approved. So this is now fixed upstream too, in [3b8358].