wagtail / wagtail-localize

Translation plugin for Wagtail CMS
https://wagtail-localize.org/
Other
227 stars 87 forks source link

Repeated text fragments misleadingly show "missing" translations #624

Open olifante opened 2 years ago

olifante commented 2 years ago

Adding repeated text fragments to a source page and synchronising it to the target page results in some fragments showing apparently missing translations in the Admin for target page. However, publishing the target page actually shows no missing translations.

In a way, this makes sense, as PO files should have only one instance of each translatable text fragment. Instead, what we observe when we export the PO file for a target page with "missing" translations is that the same fragment appears multiple times, with and without translations. For the published page, one instance of a translatable text fragment is enough, and it will correctly use it to display the translated text.

This seems to consist of two related problems:

This behaviour is confusing for editors.

I reproduced this problem on a localised version of the bakery demo, using the following combinations:

olifante commented 2 years ago

Here's a couple of examples using wagtail 4.0.2 + localize 1.3a4:

Screenshot 2022-10-04 at 10 48 55 Screenshot 2022-10-04 at 10 49 13

zerolab commented 2 years ago

Thanks for this @olifante, will be looking at it on Friday

zerolab commented 2 years ago

Alright, so we have two things here:

  1. The screenshots above related to the fact that the string has different contexts, so that gets output as is, thus two separate entries. Translating just one means that gets ingested to one field (block etc) rather than everywhere.
  2. When the duplicate strings are in the same context (i.e. the same field) they end up as separate entries in the generated PO, but with the same msgctxt. This happens because paragraphs are extracted as their own segements
Screenshots Source | Localize -------|--------- ![source](https://user-images.githubusercontent.com/31622/195985330-a5ff6dc3-6051-4082-aa17-a6cc8f3f31ea.png) | ![localize](https://user-images.githubusercontent.com/31622/195985332-be009501-2783-4d51-99df-291c872b7bbc.png)
PO contents ```po # msgid "" msgstr "" "POT-Creation-Date: 2022-10-15 11:53:28.948436+00:00\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "X-WagtailLocalize-TranslationID: 90fbb0a5-69cf-4e69-85ca-560450c9e56e\n" msgctxt "title" msgid "Duplicate segments test" msgstr "" msgctxt "body.f715b84e-7250-46d7-9123-19109d08c054.heading_text" msgid "Hello" msgstr "Ciao" msgctxt "body.71b42605-e2b9-429b-bee2-b232c8354c2b.heading_text" msgid "Hello" msgstr "" msgctxt "body.e527c6db-f0ee-4505-bda8-43485a31f631" msgid "This is my text that I will duplicate in the same field" msgstr "" msgctxt "body.e527c6db-f0ee-4505-bda8-43485a31f631" msgid "a second line" msgstr "" msgctxt "body.e527c6db-f0ee-4505-bda8-43485a31f631" msgid "This is my text that I will duplicate in the same field" msgstr "" ```

So the things is two-prong:

  1. ensure no duplicate strings with the same context are output
  2. output multiple mesasge contexts in msgctxt if the same string is used in multiple places

Note: this will require several hours to a day

ssstain commented 1 year ago

Same happens even if the context is the same

msgctxt "body" msgid "Hello" msgstr ""

msgctxt "body" msgid "Hello" msgstr ""

msgctxt "body" msgid "Hello" msgstr ""

poEdit deletes duplicates as invalid. wagtail-localize doesn't take single translation for all occurrences.

abc013 commented 9 months ago

Hey there, we also stumbled across the problem in our project. Is there any progress on this issue?