mozilla / activate.mozilla.community

Activate campaign site
Mozilla Public License 2.0
30 stars 60 forks source link

Review site localization system #34

Closed deimidis closed 7 years ago

deimidis commented 8 years ago

As @MichaelKohler suggest in another issue, we need to set a communication to all the locales availables when a content is updated, so they could make the changes.

deimidis commented 8 years ago

Adding @A-kilroy

a-kilroy commented 8 years ago

Would love more details on this since I think there could be some overlap in what the foundation is working on to better integrate localizers. We might be able to piggy back off some of that work.

MichaelKohler commented 8 years ago

@a-kilroy thanks for jumping in, let me try to answer your question.

For the activate.mozilla.community website, we're using this repository here, which also includes all texts that are displayed on the page. The English, original source files, are located in the _pages folder directly: https://github.com/mozilla/activate.mozilla.community/tree/gh-pages/_pages

For every localization there are the following steps to do:

Guillermo created a Pull Request with a template folder which can be copied to make it easier to add a new language. In hindsight, I'm not sure if that is helping us with this problem here. But let me first describe the translation problem.

As you see above, we have 2 different documents, EN and ES. Now there are two possibilities: 1) Somebody changes a link or something generic in the English document -> this person could change this in the other languages as well, as long as there are no specific language skills needed 2) There is new text, or any change on existing text -> The person changing it can only do it in EN (assuming the source always gets changed first)

In both cases, if there is any language skill involved, it will require a localizer to change the text in the language specific file as well. As it is, the change in the source file will be done as a normal Git commit which does per default not notify anybody watching the repository. So currently there is no way to automatically notify all localizers that something changed and a re-translation should be done.

This is basically the same as if you have a Google Doc with the English text on it and send it out to somebody to translate, they most probably will copy the English one and write the text for the other language. If the English document gets changed, it won't notify the localizer about it.

One possibility would be to have a list of people to notify and if somebody directly changes the text or merges a Pull Request to make this person responsible to notify each one of these persons. This seems cumbersome to me though, but right now I don't really have any other suggestion myself. Another possibility would be Pontoon (l10n tool Mozilla localizers already use) to make this possible, but I don't know how well this would work with full Markdown files instead of small strings. Maybe @mathjazz could enlighten us here?

In any case, we need to make sure that the locales are getting updated as well, not only the English source.

I hope that is a clear (even though very long) problem statement. Feel free to ask if something is not phrased clearly or if you spot any mistakes.

mathjazz commented 8 years ago

There are numerous reasons why embedding text directly to the code rarely works for localization purposes.

So I suggest you internationalize the site using one of the i18n libraries, which will create resource files. It's a one-time task that will take significantly less time on long term than the current solution.

a-kilroy commented 8 years ago

At the risk of oversimplifing it, an ideal scenario would be to integrate Pontoon right? I believe this is something MoFo is already working on and since they have several sites that use Github pages I expect there could be some overlap/coordination opportunities. I was trying to understand want we'd like to do and the specific problem so that I can figure out the right people to talk to/connect. I think I understand the problem but not the ideal solution. And honestly if it's not helpful I can drop it.

mathjazz commented 8 years ago

Yes, if the site is internationalized, we can plug it into Pontoon easily, which is used to localize most if not all Mozilla (MoFo & MoCo) websites. We have best practices and docs for this. The contact person for website localization is @peiying2.

brianking commented 8 years ago

Given this discussion, let's put a HOLD on integrating any more locales right now. Happy to see the conversation happening though!

MichaelKohler commented 8 years ago

On the other hand there are several l10n plugins for Jekyll like https://github.com/Anthony-Gaudino/jekyll-multiple-languages-plugin . That one handles the translations in .yml files which would allow us to have at least string based translations. On the other hand I'm not sure if our localizers are used to .yml files though. Haven't found one that would use properties-files, but there might be ones as well.

nukeador commented 8 years ago

@mathjazz what can you provide us to have Jekyll adapted to what you are suggesting? We don't have the resources to build something here on top of vanilla Jekyll.

Right now we have a couple of P1 languages we need to deliver where people just need to localize a markdown file, we can improve in the future ;-)

gueroJeff commented 8 years ago

Let's do what we can to not fragment the l10n process/tool chain. This will only make it harder for the community to engage on these types of projects.

nukeador commented 8 years ago

I agree, that's why we are asking for your help here :)

In the mean time we know our current process is not perfect but we wanted to deploy something fast and scrappy, we can improve as we go ;-)

gueroJeff commented 8 years ago

One possible short-term plan may be extracting strings into your md files and converting those to xliff for use in Pontoon. It seems that there's already a utility out there that can perform that conversion -- https://github.com/tadatuta/md2xliff

The long-term strategy would be to convert everything over to HTML and use the l20n framework.

nukeador commented 8 years ago

I see, we want to use markdown to allow non-technical people to add/update content to the site directly from github UI, that's the whole purpose of using Jekyll (also in-build github pages support)

I don't know if we can have markdown for English and then extract strings for other locales?

gueroJeff commented 8 years ago

This short term plan allows you to continue using markdown, while using a standard localization format that Pontoon supports and that preserves the document structure.

El 22 ago. 2016 3:19 PM, "Nukeador" notifications@github.com escribió:

I see, we want to use markdown to allow non-technical people to add/update content to the site directly from github UI, that's the whole purpose of using Jekyll.

I don't know if we can have markdown for English and then extract strings for other locales?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mozilla/activate.mozilla.community/issues/34#issuecomment-241554107, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1yZJJchfbPhLKLQeA48KnsbVg1TsG2ks5qihJigaJpZM4JlXKj .

nukeador commented 8 years ago

Cool, any guides on how we should provide the xliff files so people can use pontoon and how to integrate them back, thanks! :-)

gueroJeff commented 8 years ago

I would experiment with that script I linked to in a previous comment to convert between the two formats. Once your comfortable that there's no data loss, if you set up a strings repo with a directory per locale containing the xliff files, Pontoon only needs the URL to the en-US repo directory and can pull them in.

El 22 ago. 2016 3:30 PM, "Nukeador" notifications@github.com escribió:

Cool, any guides on how we should provide the xliff files so people can use pontoon and how to integrate them back, thanks! :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mozilla/activate.mozilla.community/issues/34#issuecomment-241556944, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1yZCJhMGK2xgXCDdh1nSY8-u-alUPgks5qihTZgaJpZM4JlXKj .

a-kilroy commented 8 years ago

From the foundation: https://github.com/MozillaFoundation/Advocacy/wiki/Localization:-How-it-happens-during-Copyright This is not exactly the same set up but maybe is helpful. I also believe they have something for their github sites though I can't find it on their wiki so might be worth reaching out to them.

On Mon, Aug 22, 2016 at 11:40 PM, gueroJeff notifications@github.com wrote:

I would experiment with that script I linked to in a previous comment to convert between the two formats. Once your comfortable that there's no data loss, if you set up a strings repo with a directory per locale containing the xliff files, Pontoon only needs the URL to the en-US repo directory and can pull them in.

El 22 ago. 2016 3:30 PM, "Nukeador" notifications@github.com escribió:

Cool, any guides on how we should provide the xliff files so people can use pontoon and how to integrate them back, thanks! :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mozilla/activate.mozilla.community/ issues/34#issuecomment-241556944, or mute the thread https://github.com/notifications/unsubscribe- auth/AB1yZCJhMGK2xgXCDdh1nSY8-u-alUPgks5qihTZgaJpZM4JlXKj .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mozilla/activate.mozilla.community/issues/34#issuecomment-241559902, or mute the thread https://github.com/notifications/unsubscribe-auth/ATI2-HozngMJJHBQL7FIIN-pGkKo-8szks5qihdhgaJpZM4JlXKj .

mathjazz commented 8 years ago

@nukeador Converting MD to XLIFF and back to MD as @gueroJeff suggested sounds like your best bet.

You will have to convert MD to XLIFF every time there's a change in the source languge and also convert back from XLIFF to MD regularly so translations are deployed to production.

You can use this existing repository or a separate one for storing XLIFF files. See Section A of Pontoon docs for more details: https://developer.mozilla.org/en-US/docs/Mozilla/Implementing_Pontoon_in_a_Mozilla_website

nukeador commented 8 years ago

@mathjazz I've done a quick test with the md2xliff tool and I have found two issues:

Ideas? :-)

What I managed to do was:

brianking commented 8 years ago

Discussed this in France with nikos - what are your thoughts @comzeradd ? We should aim to have the same system for Clubs site.

comzeradd commented 8 years ago

Yes, we also had a brief conversation with @nukeador about this. Since we have the requirement of keeping markdown this limits our options . For instance using something like webL10n or the solution @a-kilroy posted above from the advocacy page.

I'm not very familiar with pontoon, but why md2xliff is not good enough? Is it much of a problem that it doesn't produce diffs and re-created the whole file?

nukeador commented 8 years ago

We could have a script to provide diffs and recreate but I was wondering if this is something that Pontoon is able to handle.

comzeradd commented 8 years ago

To sum up, I see a few requirements here:

  1. Keep the content in markdown, to make it easy for non-technical people to edit it.
  2. Avoid using Pull Requests for localizing content.
  3. Use standard mozilla l10n tools (pontoon), to increase project and content changes discoverability from localizers.

All jekyll's localization plugins and methods involve opening Pull Requests for localized content, which is not desired in our case.

I did some tests with md2xliff and besides the issues with the metadata headers it works nice. So my suggested course of actions would be:

  1. Add this project to pontoon.
  2. Create a locales folder in this repository to put the xliff files, in the structure documentation suggests, and give write access to pontoon.
  3. Manually create the xliff files from engiish markdown pages. I'd suggest we remove headers on extract (and re-add them on reconstruct) to avoid problems.
  4. Periodically recostruct localized xliff files to markdown.
nukeador commented 8 years ago

@comzeradd I agree. What would you need?

@mathjazz @gueroJeff Is this something you can support us?

Thanks!

mathjazz commented 8 years ago

Sounds like a plan!

What matters for Pontoon is that files in a supported file format are available at the right place in the repository it can write to. And that's covered by the plan proposed by @comzeradd already!

I'm no expert in XLIFF files, but since it's a bilingual file format, I suspect every time a new en-US XLIFF file is generated, we'd also need to merge those changes into localized XLIFF files. There must be scripts that do this. I'll add @gueroJeff and @flodolo to comment on that (both of them are currently on conferences). Please note that Pontoon can work without this step, but your application might not.

mathjazz commented 8 years ago

I'm no expert in XLIFF files, but since it's a bilingual file format, I suspect every time a new en-US XLIFF file is generated, we'd also need to merge those changes into localized XLIFF files. There must be scripts that do this. I'll add @gueroJeff and @flodolo to comment on that (both of them are currently on conferences). Please note that Pontoon can work without this step, but your application might not.

Wait. Your app doesn't use XLIFF files directly, it uses MD files. So as long as the xliff2md script can create valid localized MD files, this step is not needed.

comzeradd commented 8 years ago

First step would be to enable Pontoon, so I opened a bug.

mathjazz commented 8 years ago

That's the last step I believe. Requirements under Section A need to be met first: https://developer.mozilla.org/en-US/docs/Mozilla/Implementing_Pontoon_in_a_Mozilla_website

That's basically steps 2 and 3 from your list.

comzeradd commented 8 years ago

Yes, good point. I started creating the locales files. One thing I'm not sure about is whether I should include the original (en-US) files too, since xlf files have a source and target locale anyway.

mathjazz commented 8 years ago

Yup, we need the en-US folder with original files.

To give you an idea, here's the (only) xliff-based project we currently localize: https://github.com/mozilla-l10n/firefoxios-l10n/

comzeradd commented 8 years ago

Thanks

I added the locales files, gave write access to the mozilla-pontoon bot and update the bug :)

mathjazz commented 8 years ago

Thanks @comzeradd! Could you use the .xliff file extension?

ioana-chiorean commented 8 years ago

🎉

comzeradd commented 8 years ago

@mathjazz done.

mathjazz commented 8 years ago

Thanks!

XML parser is throwing an error:

Traceback (most recent call last): 
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task 
    R = retval = fun(*args, **kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/newrelic-2.50.0.39/newrelic/hooks/application_celery.py", line 66, in wrapper 
    return wrapped(*args, **kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__ 
    return self.run(*args, **kwargs) 
  File "/app/pontoon/sync/core.py", line 59, in wrapped_func 
    return func(self, *args, **kwargs) 
  File "/app/pontoon/sync/tasks.py", line 226, in sync_translations 
    vcs_project.resources 
  File "/app/.heroku/python/lib/python2.7/site-packages/django/utils/functional.py", line 33, in __get__ 
    res = instance.__dict__[self.name] = self.func(instance) 
  File "/app/pontoon/sync/vcs/models.py", line 291, in resources 
    resources[path] = VCSResource(self, path, locales=locales) 
  File "/app/pontoon/sync/vcs/models.py", line 413, in __init__ 
    resource_file = formats.parse(resource_path, source_resource_path, locale) 
  File "/app/pontoon/sync/formats/__init__.py", line 44, in parse 
    return SUPPORTED_FORMAT_PARSERS[extension](path, source_path=source_path, locale=locale) 
  File "/app/pontoon/sync/formats/xliff.py", line 127, in parse 
    xliff_file = xliff.xlifffile(f) 
  File "/app/.heroku/python/lib/python2.7/site-packages/translate/storage/xliff.py", line 549, in __init__ 
    lisa.LISAfile.__init__(self, *args, **kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/translate/storage/lisa.py", line 282, in __init__ 
    self.parse(inputfile) 
  File "/app/.heroku/python/lib/python2.7/site-packages/translate/storage/lisa.py", line 358, in parse 
    self.document = etree.fromstring(xml, parser).getroottree() 
  File "lxml.etree.pyx", line 3103, in lxml.etree.fromstring (src/lxml/lxml.etree.c:70569) 
  File "parser.pxi", line 1828, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:106403) 
  File "parser.pxi", line 1716, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:105194) 
  File "parser.pxi", line 1086, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:99876) 
  File "parser.pxi", line 580, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:94350) 
  File "parser.pxi", line 690, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:95786) 
  File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:94853) 
XMLSyntaxError: xmlParseEntityRef: no name, line 145, column 165 

Seems like &'s need to be escaped, e.g.: https://github.com/mozilla/activate.mozilla.community/blob/gh-pages/locales/es-ES/webvr-camp.xliff#L145

flodolo commented 8 years ago

Has anyone tried to go back to .md from these files, localizing a few random strings?

In only checked a couple of file: one looked fine, but others look full of unnecessary fragments. Example:

I was honestly expecting something else: no markup, just the text and some form of template to inject translations into. This seems really brittle, and given the size of the content, it would be great to do a proper testing before asking people to work on it, and potentially lose work.

comzeradd commented 8 years ago

Seems like &'s need to be escaped, e.g.:

@mathjazz I substituted & with &. Could you check that this work?

Thanks

comzeradd commented 8 years ago

Has anyone tried to go back to .md from these files, localizing a few random strings?

Yeap. But you need the skeleton files that md2xliff created to reverse the process properly. I can add them to the repo if this doesn't create any problem to the pontoon bot (because they live inside the same folders as xliff files).

what's this strange markup?

This is the way we add specific css classes and markup to the content. There is no way to avoid this if we want the reverse process of reconstructing the markdown files to work without someone having to spend a lot of time to manually adding markup code again. On pontoon we just have to copy this to the localized target.

mathjazz commented 8 years ago

@mathjazz I substituted & with &. Could you check that this work?

It seems like some &s in locale files are not escaped yet: https://github.com/mozilla/activate.mozilla.community/blob/gh-pages/locales/pt-PT/test-pilot.xliff#L129

BTW, for URLs you should probably use %26 instead of &: https://github.com/mozilla/activate.mozilla.community/blob/gh-pages/locales/en-US/test-pilot.xliff#L133

comzeradd commented 8 years ago

Thanks. I made the substitutions on url on all locales.

mathjazz commented 8 years ago

Thanks @comzeradd!

I've successfully set the test project up on Pontoon stage server (the link will be broken in a few weeks from now): https://mozilla-pontoon-staging.herokuapp.com/fr/activate-test/all-resources/?string=159738

I was also able to make a test commit to the repository: https://github.com/mozilla/activate.mozilla.community/commit/f4aa015add804176a6b89c7fa7dffed111284842. It would be great if you could use the same whitespace as Pontoon, so the diff would be easier to read, but that's the lowest possible priority.

The next step could be for someone to review the original strings and see if we can simplify them as flod suggested. There's lot's of markup and strings that don't need to be translated.

comzeradd commented 8 years ago

Everything looks ok. We indeed have some markup in there. One option would be to copy them to the localized side once it hits production, to make it easier for people to ignore them. If we remove them, then the reconstructing process would need much more manual work from someone from this team and would probably lead to slow updates on the localized content.

mathjazz commented 7 years ago

What are the next steps here?

comzeradd commented 7 years ago

I think we are good to move this to production Pontoon.

mathjazz commented 7 years ago

Thanks, @comzeradd!

Leaving it to the project management team. /cc @peiying2

peiying2 commented 7 years ago

Thanks everyone for brainstorming and finalizing a process so we can proceed.

I went through some of the strings, and saw a need for an explicit list of instruction on the kind of strings that are for localization while others should be ignored. I need to compile this list and include in my email communication to the localizers.

flodolo commented 7 years ago

I would go even further: all strings that are supposed to remain identical should be pre-translated to avoid a mess, and reduce the amount of copy and paste for localizers.

This might give you some ideas https://github.com/mozilla-mobile/firefox-ios-build-tools/blob/master/scripts/update-xliff.py

comzeradd commented 7 years ago

I just pushed a commit to pre-fill all the strings that contain only mark-up. That will hopefully reduce the complexity for localizers.

brianking commented 7 years ago

@peiying2 @mathjazz Can the strings go live now? We can also document the process for localisers here in github and elsewhere as needed.

mathjazz commented 7 years ago

LGTM.

@peiying2 ?