translate / pootle

Online translation tool
http://pootle.translatehouse.org
GNU General Public License v3.0
1.48k stars 287 forks source link

Pootle not finding some .properties files #3504

Open jleclanche opened 9 years ago

jleclanche commented 9 years ago

From the ML:

I want to translate a Java project using Java resource bundles. I have created some gradle scripts to automate initial import of existing resourcebundles and updating the files on changes in the project (using prop2po, msginit, msgmerge, po2prop) -- http://github.com/tschulte/gradle-gettext-plugin.

As is the nature of Java properties files, the translations are property files ending on .properties, translations to eg. German end on _de.properties or _de_DE.properties, _de_AT.properties or even _de_DE_BY. We do even have bundles with _de__CustomerA.properties (no country code, but with variant to support customer specific translations).

To support automatic processing of the bundles, we have a naming convention in place. We do only allow A-Z, a-z and a hyphen ([A-Za-z-]+). With this the language code can be found by using "[A-Za-z-]+_(.*)".

Our project does have a couple of files like messages-all.properties. These where not found by pootle. I found the source of my problem in pootle/apps/pootle_app/project_tree.py:

    #: Case insensitive match for language codes as postfix
    LANGCODE_POSTFIX_RE = re.compile('^.*?[-_.]([a-z]{2,3}([_-][a-z]{2,3})?(@[a-z0-9]+)?)$',
                                 re.IGNORECASE)

I fixed this in my pootle server by changing the regex to:

 '^.*?[_]([a-z]{2,3}....'

But I think this should somehow go into the pootle sourcecode. Maybe using some type of preference on project level. Maybe as a fourth option for "Project Tree Style".

tschulte commented 9 years ago

Just to clarify: I generate .pot files using prop2po, create .po files using msginit, translate using these files, and finally create translated properties using po2prop. I have the project configured as File Type "Gettext PO", not "Java Properties".

Now I have for example a file messages-all.pot and messages-all_de.po. Pootle does not list this file when translating to German.

tschulte commented 9 years ago

Is there any common naming conventions when using a language postfix? Pootle allows a dot, underscore or hyphen as separator for the language postfix. Language code and country code can be separated using underscore or hyphen, the variant is always separated by an @. Language code, country code and variant are not checked for case sensitivity.

In the Java world, both country code and variant are separated by an underscore. The Language code will be always lower case (sort of: http://docs.oracle.com/javase/8/docs/api/java/util/Locale.html). The country code is always upper case. The variant is case sensitive and can contain underscores, but in that case each underscore do subdivide the variant (e.g. _de_DE_bavarian_munich).

Having language code and variant without country code is done using two underscores (cavalencia instead of ca@valencia). But for this I can configure my script to transform a messages_ca@valencia.po to messages_cavalencia.properties.

Since underscore is already used as separator for the language code, I simply disallowed using underscores for the file name. Does there exist something similar in the GNU world?

dwaynebailey commented 9 years ago

@tschulte the GNU world or specifically Gettext has the following general rules:

$project.pot $lang.po

So actually the prefix of using the same project name is actually unusual. You would have af.po de.po etc

The names of languages follows this convention:

$lang_$COUNTRY@$variant ca@valentia en_US fr etc

So a lot of the code in Pootle works on those assumptions. As you discovered you can rewrite those rules in the code and as you point out a good feature change would be allowing those mappings to be changed on a per project basis.

Its quite an invasive change so for now you'll need to hack the regex to get your expected naming convention or do transforms in your processing code. I've hit similar issues and it was easier for me to handle renames on the import side.

tschulte commented 9 years ago

OK, the default is to have exactly one template file for each project. But pootle does also support having language suffix for each file. Another possibility is having directories for each language. Pootle allows to specify the layout per project or tries to guess the layout if not specified.

As you say, lang_COUNTRY@variant is the default. But pootle also supports lang-COUNTRY@variant.

Same for suffix: _lang_COUNTRY@variant, -lang_COUNTRY@variant, .lang_COUNTRY@variant, _lang-COUNTRY@variant, -lang-COUNTRY@variant, .lang-COUNTRY@variant -- all valid suffixes as long as pootle is concerned.

Just changing the pattern to only allow _lang_COUNTRY@variant would most probably be OK for 90% of projects, but would break some. There is most probably a reason for supporting hyphen and dot.

I will try to dig into the pootle sourcecode to get more insight. Maybe by first looking into the tests. Is the above documented anywhere? If not, where should it be documented, so I could document my findings and create a pull request?

In the meantime you are right, in my case it would be easier to just change the filenames of these few files to not end with [._-[a-z]{2,3} (messages-all.pot -> all-messages.pot).

jason-p-pickering commented 9 years ago

We are also having trouble with this, but are using Java properties files directly. Any quick fix/hack for getting Pootle to recognize files like "i18n_app.properties" (the template) and "i18n_app_ar.properties" which is the Arabic translation file?