racket / string-constants

Other
6 stars 32 forks source link

POT and PO usage for translations? #31

Open martinjirku opened 5 years ago

martinjirku commented 5 years ago

I am new to Racket... I would like to translate it to my own language - Slovak. I've started the translating by copying english english-string-constants.rkt to slovak-string-constants.rkt. I've updated files, but during the translation process I've missed out ", and I had to resolve bug to make my translation work again.

So I've found myself wondering how are changes handled? New strings, removed strings? How is the translation decoupled from the code?

In my professional experience on international projects we had very nice process for i18n by using gettext POT and PO files (more here):

  1. Developer specified strings in file like english-string-constants.rkt. Developer removed/add strings as needed in application.
  2. A tool extracted these strings to POT file.
  3. POT (and language PO file if it already existed) was provided to translators
  4. Translators used tool of their choice (there are plenty of them e.g. poedit) to translate strings. By opening PO file, specifying the POT (template) file, they were able to see, old-unused translations or newly added translations.
  5. After translation was done, updated PO files were transformed with some sort of script back to files like: slovak-string-constants.rkt

Because I like Racket and stuff around it, and I like to learn by doing, I would like to contribute to DrRacket project (string-constants) not only by translating, but by creating such a tool which would simplify translation and proofreading for non developers (including maintenance of new strings and removing non-used ones).

What do you think? Do you think it will improve project somehow?

rfindler commented 5 years ago

As a general rule, we never change any string constants to avoid trouble with translators missing that they changed. We rename and delete old ones, and that gets noticed by setting the PLTSTRINGCONSTANTS environment variable and starting up DrRacket (although I see there is a lot to tend to in that output currently, so maybe it isn't being used).

One thing that'll be different in the Racket world from others is that the files like english-string-constants.rkt are actually programs, written in a custom language, and you can open them in DrRacket. In addition to helping with things like mismatched quotes (and spelling errors if you turn that on in the bottom of the "Edit" menu), it is not difficult to add more support to the language to help you do your work. And if you want to take this as a racket-learning experience that's definitely the way to approach the problem. This CACM article is a place to start to read about the philosophy behind this paragraph.

That said, reusing the support that others have built for things like PO/POT files does sound wise!

alshopov commented 5 years ago

I am Alex Shopov - I did the Bulgarian translation of DrRacket and also I am long time free/open source translator. Currently Racket's strings are kept in racket's source format - basically S-expressions + comments

For ease of maintenance I basically keep the Bulgarian translation constanly in sync with the English version. Also I keep all the comments, order of strings and the formatting. This helps me with updates.

As a long time translator I would strongly advise for tooling doing conversion to and from po files. Just to let you know - gettext has support for Scheme syntax[1]

If we severely limit the scope that becomes a nice and standard intro project.

  1. Convert from S-syntax to po-syntax so you can start translating
  2. Do that while also getting the comments as comments to strings in po
  3. Convert from po back to S-expressions in the oorder of the English version for ease of maintenance
  4. After everyone is onboard with the shiny po - add support for:
    • fuzzy strings
    • plurals - Slovak being West Slavic - you will need at least two plural forms but based on [2] - Slovak is harder than Bulgarian (three forms, special cases for 1 and 2, 3, 4 numbers)

I asked about creating such a layer for Racket[3] at the onset of my translation but at that time there was no explicit interest and I decided that it would be better to spend the time doing the actual translation.

[1] https://www.gnu.org/software/gettext/manual/gettext.html#Choice-of-input-file-language [2] https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html#DOCF5 [3] https://groups.google.com/forum/#!topic/racket-dev/2DVnaRDBRz0

Kind regards: al_shopov

На пт, 23.11.2018 г. в 15:05 ч. Robby Findler notifications@github.com написа:

As a general rule, we never change any string constants to avoid trouble with translators missing that they changed. We rename and delete old ones, and that gets noticed by setting the PLTSTRINGCONSTANTS environment variable and starting up DrRacket (although I see there is a lot to tend to in that output currently, so maybe it isn't being used).

One thing that'll be different in the Racket world from others is that the files like english-string-constants.rkt are actually programs, written in a custom language, and you can open them in DrRacket. In addition to helping with things like mismatched quotes (and spelling errors if you turn that on in the bottom of the "Edit" menu), it is not difficult to add more support to the language to help you do your work. And if you want to take this as a racket-learning experience that's definitely the way to approach the problem. This CACM article https://cacm.acm.org/magazines/2018/3/225475-a-programmable-programming-language/fulltext is a place to start to read about the philosophy behind this paragraph.

That said, reusing the support that others have built for things like PO/POT files does sound wise!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/racket/string-constants/issues/31#issuecomment-441249213, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbbT-kKkZHhoX9xzQ22Nf8VNXR0sDLmks5uyACqgaJpZM4YwjRI .

rfindler commented 5 years ago

Sorry if I misunderstood your earlier comments! I don't know anything about po but plural (and word order when we want to drop a number or something computed into a string) is something that we struggle with for sure.

martinjirku commented 5 years ago

@rfindler I will look at it closer. Thank you very much for directions...

@alshopov Yeah, pluralization is not easy to do it right.

Currently I am using ICU message syntax, which is industry standards with support of all CLDR languages in JavaScript/C++. It's nicely done, we are able to do pluralization for all language our app supports. It would be nice to have in racket...