tomboy-notes / tomboy-ng

Next generation of Tomboy
MIT License
389 stars 38 forks source link

Controlling version of translated help notes #148

Open aguador opened 4 years ago

aguador commented 4 years ago

Congratulations on getting 0.27 out. It looks really good and I am anxious to explore more.

Unfortunately, a minor thing I suppose affects only translated help notes caught my attention immediately. In any event opening issue seems appropriate as others may have some ideas.

After updating, I checked config which indicated that Spanish help notes were installed:

HelpNote-ng-027-2020-03-15

However, when I checked the notes, they were the ones done prior to the 0.27 release. Double clicking updated them properly. So three somewhat related issues (other than your Spanish translator shortening that string about translation! [easily done]):

  1. The short-term fix is to change the instruction to "Double click the language to install or update". The longer-term solution involves a mechanism to automate the updating of translated notes - or have translated help notes as a completely separate step with en notes being installed by default each time
  2. I believe that selection of the help language belongs in the "Basic" tab which, with the elimination of one option on search, now has more room:

Config_ng_027-2020-03-15

  1. We still have the problem of controlling the version of the translated notes that I note here to invite commentary.

For interested parties and prospective translators, the third point is something David and I have been discussing. Interfaces use something called "gettext" and .po translation files. When something changes, the former translation is automatically ignored. This means that when a program is updated but its translation is not, new phrases or those that have changed appear in English. This is clearly preferable to having a translation that does not correspond to the original.

Currently, the help notes are XML files that can be updated easily with CAT (computer-assisted translation) programs like OmegaT, but not dedicated gettext editors like poedit. However, the weak link is the human one: translators must know about the changes and update the file. If that does not happen, the help notes will not reflect the latest changes.

David and I have talked a little more, but I will leave it there for third parties to view -- and until I have proposals beyond what he and I have already discussed a bit.

davidbannon commented 4 years ago

Hmm, that's actually a real problem. Its quite hard for me to tell when a new version is run for the first time. Short of writing the current version to config file, quite hard. Its really something that should be in the post install scripts but they are not portable between systems. Ideally, we'd tell the user when new versions of the help notes are available ? But that would involve polling the github site regularly, I don't believe we should do that without users permission.... I'll think on it .... Points you make are valid ....

aguador commented 4 years ago

OK, I don't think that polling github is needed. I think there are two fundamental issues here: controlling the version at the time of installation and "controlling" updates to the translations.

Installation: If the help notes accompany the program, then the help notes files are installed with it. There are three options here: a) always install the latest English language notes and let the user deal install the language they want; b) always install the English, but notify of the availability of other help in other languages to allow manual update; c) detect OS language and choose help notes accordingly.

If you really want to complicate your life, you could not only automate language selection, but examine note update date do decide if the previously installed notes needed to be overwritten.

Translation: Here my concern has been about the possibility that translations do not keep up and there is "bad" information in the translated help notes. This is about "version control" and translation tools.

Controlling updates to translations: You already have the version control system in Github. The trick is simply to link it to the translation files on a string-by-string (line number) basis such that any change you make in the English help note replaces the corresponding previously translated string in each of the translations, or adds or deletes a string or strings as the case may be. If this can be done easily, then two things are accomplished: a) translators can see changes immediately and update the translation as they might with a .po translation tool and b) if translation falls behind, "bad" information does not appear in a help file, simply untranslated "good" information.

Possible? Thoughts?

Translation tools: It has been my assumption that translators will be most familiar with .po editing tools (poedit to mention the best known one). I have identified a utility to convert between .po and .xml files that may be of use here. I will test and report. However, once a translation has be completed, if there are only minor updates, use of a simple text editor is also a possibility.

Perhaps Vistaus could comment?

aguador commented 4 years ago

As part of our discussion outside github (my fault: It should have been here.), I mentioned the possibility of placing the translated notes in root (in Linux, obviously). There may be another argument for it.

To test something today, I decided to switch back to English notes -- which had some unexpected consequences.

At first it worked as expected. Then I tried to reinstall the Spanish help notes, which resulted in this:

ERROR - File download failed https://github.com/tomboy-notes/tomboy-ng/raw/master/doc/es_notes.zip

After that, in the menu "Help" appeared, but not the links to the the help notes in either English or Spanish. Upon investigation, the "alt-help" folder in /home/xxx/.config/tomboy-ng was empty. Deleting that folder restored access to the English help notes, but the Spanish notes still could not be downloaded (perhaps a problem with github?).

Now, strangely enough, even after attempting to download the Spanish help notes again, the "alt-help" folder was re-created, but remains blank (due to the download error), yet, unlike before, I still have access to the English help notes.

All very strange!

davidbannon commented 4 years ago

Hmm, -ng decides to use whatever notes are in the alt-help notes directory, if it exists. So, if its there but empty, thats pretty much what I would expect. Do not know why the download failed however. Hope its not another ssl version issue !

Anyway, when I looked further at bundling all the help note, it really started to look quite attractive. It simplifies things greatly, I no longer need to ship openssl libraries with the windows version (saving far more space than its costing) and it will make revision control a lot easier. The notes are no longer stored on github zipped, have a look in the source tree, you will find them in doc/HELP/[ES/EN]/. And while english will still be the default 'out of the box' the others will be readily available, a more democratic model IMHO. Less ugly language dependent hard coding in both -ng and the build scripts. Its substantially coded but waiting for a lot of testing .....

davidbannon commented 4 years ago

Another, possible benefit of this change is that the help note files no longer need to be named in English (the files them selves, I don't mean the note title), we could give them, for example, appropriate spanish names. User will not see it but would it make version control easier ? Especially of you are opening both the english and spanish ones.

And its just a touch less English centric ....

Davo

aguador commented 4 years ago
  1. DK what the download problem was as I had updated the Spanish translation a few days before with no problem. I just grabbed a copy of another machine rather than play more. (Actually the problem was that you had already moved the notes!)

  2. Not storing the files zipped is a minor plus from a translation point of view as it eliminates decompressing and compressing the files. Now simply renaming the decompressed files from .note to .xml allows them to be used with most translation memory tools.

  3. How will using non-English help with version control? They still need to be numbered in some way, right?

aguador commented 4 years ago

Will not open another issue now that you are altering the help notes, but Spanish help notes are not working properly in 0.29.

From systray: Settings > Help > X help topic > open topic (or not) Return to systray: Settings > Help > [nothing - no list of help topics appears]

I checked after restoring English notes and everything functions normally.

OK, now to grab a copy of the Spanish notes from my other machine now that you have moved the Spanish notes . . .

aguador commented 4 years ago

Darn it, want to sign out, not close!

davidbannon commented 4 years ago

Thats strange indeed. Nothing significant has really changed with help notes for some time. Please indulge me, switch to Spanish, restart -ng. Does that get you some spanish help notes ?

Davo

aguador commented 4 years ago

OK, maybe that was a fluke as I cannot reproduce it. It may be hardware issue as this HD: it has been very slow on shutdown and, testing for this, E crashed (with alarming red warning!) and restarted . . . .

aguador commented 3 years ago

This seems to be solved now that help files are in /usr/share. However, I did a clean install on a .deb machine here* and, while -ng detected Spanish for the interface language, it defaulted to the English language help file. I assume there is an easy way to have -ng default to the system language if help files are available in that language, right?

*(Aside: dpkg -P did not purge the config files. Is that a dpkg or -ng issue? As you know, dpkg is a 3rd or 4th language for me!)

davidbannon commented 3 years ago

I am not certain there is a dependable one to one mapping between the language relating to the PO files and the language chosen for Help notes. Maybe we could have a little look up table that declares these notes are usable with PO system XX, YY etc ? And we have the issue of incomplete help notes translations, the person doing French for example managed to do only the main intro note before having to bail out and deal with some extensive family issues relating to C19. If you use a language with a incomplete set, you may decide that the English ones are better than nothing. Part of the problem is my very wordy (ie, friendly and chatty) help notes take a lot of translation ....

aguador commented 3 years ago

Nah, the notes are not a problem to translate unless the translator is a bit weak in English.

There are actually two separate issues, one for translators, and one for users.

For translators, AFAIK, if they use a translation memory program, it is likely poedit or something similar, so being able to transform the .note help file to .po it would help them. This is not a big deal, but if it means more use of translation memory software, it might improve the consistency of each translation.

For users there are two issues. First, if you were able maintain the help notes as .po files it would mean that changes in the English that were not translated before a release would affect only the strings with changes. So, I would read a Spanish help note, but encounter an English sentence because its translation had been changed. That would mean that I had most of the information, perhaps what I needed at that moment, and would just have to struggle with or lose out on one sentence. In the case of the French translation, I would see the one translated help note with the others in English, even though they were part of the French .po.

Second, if you were not able to have or easily have the help notes in .po files, the question becomes how to avoid out-of-date notes, e.g., in areas like sync and perhaps recovery. For example, had the Spanish translation of the help notes been not up to date, the user might have be directed a tab that you had eliminated in sync or recovery (can't recall which at the moment).

One way to deal with that might be to check the date stamp on the help note and substitute the English note for the translated note if the former is more recent. The latter approach has some obvious problems. a) Do you do the substitution when you compile or at the time of installation and what are the costs involved? b) Even an insignificant update (like those missing apostrophes that Debian didn't appreciate) would trigger a substitution.

A second, but similar, approach would be to revert to separately downloadable help notes where the version number must match for the notes to be installed. This is likely worse because it would apply to the notes as a whole when it was only one that was affected. I think applications like LibreOffice only use that approach because of the size of the help files involved.

A third way to deal with this in the absence of .po help files is to generate a warning for an out-of-date help note (again, based on file date). So if I open a Spanish language help note and its translation has fallen behind, it appears with a warning that there has been some sort of change in the English note that is not reflected in the in the translation. Under the current system the user, if able to understand English, could load the English help and use that.

I will confess that I may be overthinking this as -ng is stable enough that there are no major changes to help files these days -- and will likely be fewer as time goes on.

davidbannon commented 3 years ago

I am not sure how we would manage a Tomboy style note in a .po format. I think of the .po model as a series of unrelated short phrases. Would we make each po entry a sentence or a paragraph ?

And we would loose the markup too, you would not want to be editing in the xml ! So, perhaps convert to md ? I can read a markdown file on the fly or we could convert md to xml at package time ?

'compiling' md notes in .po format to tomboy xml might solve some of the staging files you mention above.

Hmm, are we talking about a workflow where the help notes are written in tomboy format, using tomboy-ng, converted to .po, translated, converted back to tomboy format ? I would only keep the .po files in git, if I, or someone else, wanted to make a change they could edit a word or two in the .po file, a major rewrite would require convert to tomboy, edit, export as .po.

Hmm 2, an export option in tomboy-ng that writes a .po file with markup in md where necessary ?

Hmm 3, Sigh ....

Davo

davidbannon commented 3 years ago

I have written up a very blue sky plan, please pick holes in it. Down bottom of - https://github.com/tomboy-notes/tomboy-ng/wiki/Things-for-Next-Release

Davo

aguador commented 3 years ago

Regarding your first reply, md seems irrelevant to me. One can work in XML as easily as Md from my perspective. If using a text editor or Geany, just leave the tags in place. If using a CAT (translation memory) tool, the tags are simply hidden from the translator and dealt with by the software.

As far as what the strings are, yes, the normal way with CAT tools is to break at the sentence level. And, yes, if .pot and .po files can be generated, translators would work with them only.

I need to look at what msgmerge is all about, but I think your workflow is about right (as I concluded after writing the following!). While I admit that I have not dealt with all the magic of gettext, basically what it does is look for text strings attached to lines of executable code.* Once the .pot and .po files are generated, whenever you change a string attached to the code, gettext (msgmerge?) marks the corresponding translations in the .po files as "provisional" or "needs work" and the string in the source code (English) will be substituted in the xx.po until the translation is updated/marked as complete.

So turning to a full-text version of workflow (basically thought through separately coming out basically where you did), what I believe we are talking about is a separate script to generate the .note files. The script would grab a series of lines of "reference code" (likely just "variable names" such as syncH1 ... syncHX for the sync help) with the associated text string (sentence or heading, e.g., "Sync with Tomdroid"), assemble all the strings with spaces and paragraphs and formatting allowed in gettext (bold, line breaks, etc.), and generate the resulting .note XML file (which presumably would simply use the code from -ng that does the XML formatting).

In the most basic version of this, you would then just add or delete "reference code" and edit the text strings as you do in the main code, gettext would deal with the .po files, then you would run the script to generate the .note help files (generating the English from the strings in the "reference code" or .pot file and the translations from their respective .po files). You could, as you have in your workflow, work with .note files then convert them to .pot/.po, but that adds an extra step. Whether you work directly with "reference code" or convert .note files, gettext would deal with the .po files, and the translators would update them as appropriate as indicated in your blue sky workflow.

Whew, I guess I understand the difference between translating and coding now: you listed steps, I had to write text to think things through. In another era of my life, or perhaps a more rested moment, I might have done a proper flow chart which would have saved a lot of the verbage generated by thinking aloud. In any event we seem to have come out at basically the same point.


*You know this from the programming side. From the translation side, poedit can allow me to see what it calls the "reference code". In fact, if I have the source code on my machine, poedit can take me to the secion of code where the string appears so I can perhaps see what is going on (something that has helped resolve doubts about the translation more than once).

davidbannon commented 3 years ago

Thanks aguador, sorry been a bit slow getting back, xmas and Debian have kept me busy.

Leave the xml there ? Yes, you are probably right, its probably easy to ignore the xml than markdown. Especially with the significance MD puts on blank lines. I guess xml is clearer about what each tag means. I will need to 'normalise' the xml first, make sure tags associated with a paragraph (or sentance) are really there with that sentence. Bit messy but I have some code in md exported that already does it at a paragraph level. I have checked in some code that does MD and paragraphs, think I will revert that and do as you suggest, xml and sentences.

davidbannon commented 3 years ago

Well, I don't know.

define [ <

I have a basic system to export a note as a .pot retaining its xml but I don't see it working. The problem is sentences, I break a paragraph up into sentences but then I don't have any way to indicate the paragraph structure in a po file. Blank lines are ignored and not retained, so I inserted a msgid "[break /]" after each paragraph but poedit politely informed me that duplicates were not allowed and removed all except one of them. The issue is, gettext and poedit are not really intended for one body of structured text. I have not tested with msgmerge but reading the man page, assume it will not preserve blank lines either.

I guess I could replace my msgid "[break /]" with something sequential, msgid "[break 23/]" and each one being different, would be left alone but it would be ugly ! Hard for a translator to add a paragraph but I guess that rare. And tools like poedit will complain about all those different untranslated lines....

Would work if we used paragraphs as each ID but not sentences. Thoughts ?

I think you mentioned CAT tools ?? Do these still use the .po file format ? If so, how do they indicate which sentences belong in a given paragraph ?

Are there other tools that are more intended for translating bulk text that we can hook into ?

Davo

aguador commented 3 years ago

Many CAT tools can deal with .po files, such as OmegaT, which I use regularly.

With details that I do not know, gettext keeps track of changes so that the updated .po files it prepares retain all translations active for strings that have not changed. For changed strings, the translated string is retained but marked "needs work", and new strings have empty translations. Thus it is simply a matter of downloading the current .po file and opening it in poedit (or similar), updating with translations of the changed/missing strings, saving and sending it back.

I have used OmegaT for the help notes in the following way (there are slightly different workflows possible):

First-time translation:

  1. Grab a copy of the notes.
  2. Change extensions from .note to .xml.
  3. Add to a new OmegaT project for -ng, translate, and generate translated files. (Note that OmegaT's editing window simply hides the xml tags.)
  4. Change extensions back to .xml.
  5. Submit.

Updating translations:

  1. Identify note(s) changed since last translation (by date).
  2. Grab that/those file(s).
  3. Change file name(s) to reflect version to be released, and change extension(s) to .xml.
  4. Add to existing OmegaT -ng project. (Allows unchanged strings to be matched to existing translation), translate new/changed strings, generate translated file(s).
  5. Change both file names and extensions back to .note.
  6. Submit.

The process is straightforward, but from your side and the users' side, the control at the sentence level is lost. If there is a change in a help note without an updated translation, either the "old" translation will be included in the release or, if you add a control for file dates, out-of-date translated note(s) will be excluded from that release or a warning will appear that the note(s) is(are) not up-to-date.

What gettext does for strings to be displayed in programs is it simply uses the base (English) language for "needs work" and untranslated strings. Here is a note recovery example as it might appear with that process. In the third sentence, the misspelled "Fishnew" was corrected after the Spanish translation was last updated thus prompting the use of the whole English sentence:

Maldiga el desarrollador y el día que nació. Haga clic en "Recuperar". Pray to at least 5 Greek or Roman Gods, plus Vishnu. Contenga la respiración por 7-8 minutos.

(This shows not only how interface translations are handled, but why one would want sentence rather than paragraph-level translation chunks.)

As far as line breaks, the last E translation I did was using < ps/ > (spaces added around ps/ to getit to display) in the strings for line breaks, and it was possible to use two for a blank line.

davidbannon commented 3 years ago

Thanks aguador, yes, I can do most of what you outile there. But what does not work is identifying end of paragraphs. You mention putting a < ps/ > in there to indicate an eop, was that done as a seperate line (with its own msgid and msgstr) or is it appended to the end the last sentence or the paragraph ?

I have not heard (until now) of OmegaT, looks interesting. If you were translating, for example, a tomboy-ng note, would you use poedit or OmegaT ? I have been targeting poedit but its not really suitable IMHO, you opinion would be significently more useful (given my struggling mastering of English and only other language being Tasmanian ;-) ).

Davo

Davo

aguador commented 3 years ago

You mention putting a < ps/ > in there to indicate an eop, was that done as a seperate line (with its own msgid and msgstr) or is it appended to the end the last sentence or the paragraph ?

These are inserted inline. Here is an example for an E message with the code repeated to create a blank line:

Enlightenment was unable to import the theme.< ps/ >< ps/ >Are you sure this is really a valid theme?

If you were translating, for example, a tomboy-ng note, would you use poedit or OmegaT ?

The steps posted previously are for initial translations and updates using OmegaT, a complete (FOSS) CAT tool. However, what I have been doing is unnecessary. One can reuse the OmegaT project folder without renaming the notes as the existing tmx (translation memory) is not affected. So to update translations done using OmegaT or a similar tool:

  1. Identify note(s) changed since last translation (by date).
  2. Grab that/those file(s).
  3. Change extension(s) to .xml.
  4. Add to existing OmegaT -ng project (overwriting the previous "source" files).
  5. Change extension(s) back to .note.
  6. Submit.

Actually, anyone working professionally as a translator who wants to translate the help files will have a CAT tool that can handle XML files. For those who do only software translations with poedit or an online tool, the Anaphraseus extension for LibreOffice may be the way to go. I will have a look.

davidbannon commented 3 years ago

Hmm, I am not convinced ! I can now export a tomboy-ng note in a .pot format, it has newlines indicated by that (horrible to parse) < ps/ > construct. I strip the header and footer off. I can load the resulting file into poedit and it is sort of useable. But if there is a lot of xml in there, its quite hard to read, quite error prone I would expect.

I have not yet made an importer, don't believe it would be that hard but want to know we are going somewhere useful before I commit that time.

I think further research is indicated. Maybe OmegaT ? If I can produce a file that OmegaT wants to read, life might be a lot easier....

Attached is the first tomboy-ng help note, converted to pot format. It have removed the xml header and footer because we don't want to see that in POEDIT, the actual markup is still there. I don't think I would want to work with it....

tomboy-ng_help.pot.zip

Davo

aguador commented 3 years ago

Yep, looks like a basic .pot file. Not sure how it will be handled in terms of memory, etc.

Gettext and Poedit do not hide the formatting, which in general is a good things as translator control is needed. Other tools, OmegaT with Okapi xml filters, does not show the formatting tags but breaks formatted chunks (e.g., bold, italics) out separately which allows decent if sometimes suboptimal translations and assures that tags are not stripped out completely or left unpaired.

I think this approach, while perhaps feasible, is too much work and either straight hand translation or use of other tools will be better in the end.