jamulussoftware / jamuluswebsite

This is the GitHub Pages repository for the Jamulus main website. For the jamulus application source code, please visit jamulussoftware/jamulus.
https://jamulus.io
GNU Lesser General Public License v2.1
16 stars 79 forks source link

Translation wrongly split on newly updated homepage #934

Closed ann0see closed 7 months ago

ann0see commented 11 months ago

Language

German, but probably others are related too

Description

There are portions on the website which can't be translated as they are split too much into the english parts. Probably that's an issue with po4a splitting the page at bad positions:

See e.g. https://hosted.weblate.org/translate/jamulus/1-index/de/?q=state%3A%3Ctranslated&offset=7#comments

Fix

Not sure about a fix, but I think we need to tweak the page and po4a a bit.

ignotus666 commented 11 months ago

Yes, this happens for all languages and mainly affects the '1-Index.html' page. A while back I changed how po4a identifies and segments it, as it used to be even messier (I think it didn't strip out the tagged stuff and presented you with something like 5 huge segments full of non-translatable content for the whole file). The problem is that while po4a now knows to strip out that non-translatable content and break it up into smaller segments, it doesn't know how to group together those isolated "islands" of translatable text that make up a semantic unit - and unless po4a gets itself an AI engine, I don't think there's any chance it will. So as you say, the alternative would be to edit the file, but that would mean not using links.

ann0see commented 11 months ago

This doesn't sound good. The "big chunk of untranslatable text" would probably be easier?

ignotus666 commented 11 months ago

This is what you get (the last segment) - not sure it's preferable:

<div class="fx-row fx-row-center-xs" id="firstrow">
  <div class="fx-col-100-xs">
    <div itemprop="abstract">
      <h2>What is Jamulus?</h2>
       Jamulus lets you play, rehearse, or jam with your friends, your band, or anyone you find online. Play together remotely in time with high quality, low-latency sound on a normal broadband connection. <a href="wiki/Getting-Started" target="_blank" rel="noreferrer">Download it here</a>!
    </div>
  </div>
</div>
<div class="fx-row fx-row-center-xs" id="bannercontainer">
  <div class="fx-col-100-xs">
    <a href="wiki/Getting-Started">
      <img alt="Jamulus Banner. Links to getting started page" src="{{ '/assets/img/jamulusbannersmall.png' | relative_url }}" id="jamulusbanner" loading="lazy" />
    </a>
  </div>
</div>
<div class="fx-row fx-row-center-xs">
  <div class="fx-col-100-xs fx-col-50-l">
     <h2>Jamulus worldwide</h2>
    All over the world Jamulus allows choirs to rehearse and rock bands to play. Jamulus brings folk and classical musicians together. It's being used for remote music lessons,
    in schools and universities, in private and in public — all in real time on the Internet, as if you were there in person.
    <h2>Help needed?</h2>
    <p>
      Check out the <a href="wiki/Getting-Started" target="_blank" rel="noreferrer">documentation</a> and consider the <a href="wiki/Client-Troubleshooting"
        target="_blank" rel="noreferrer">troubleshooting section</a>!
      You can also ask on the <a href="https://github.com/jamulussoftware/jamulus/discussions" target="_blank" rel="noreferrer">forums</a>.
    </p>
  </div>
  <div class="fx-col-100-xs fx-col-50-l">
    <h2>Want to get involved?</h2>
    <p>
    Ideas? Found a bug? Want to contribute some code or help <a href="https://github.com/jamulussoftware/jamulus/blob/main/docs/TRANSLATING.md" title="Documentation for translation Jamulus">translating</a> Jamulus into your language? Since Jamulus is <a href="https://www.gnu.org/philosophy/free-sw.en.html" target="_blank" rel="noreferrer" title="What is free software?">free and open source software</a> (FOSS) licensed under the <a href="https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html" title="GNU General Public License, version 2" target="_blank" rel="noreferrer">GPL</a>, you can help us!<br>
    Take a look at our <a href="wiki/Contribution">contribution guidelines</a> to find out how. Everybody is welcome!
    </p>
    <p>
    <em>
      For detailed information on how Jamulus works, see <a
      href="/PerformingBandRehearsalsontheInternetWithJamulus.pdf">this paper by Volker Fischer (PDF)</a>.
    </em>
    </p>
  </div>
  <div class="fx-col-100-xs fx-txt-center">
    <a href="wiki/Getting-Started" class="button" rel="noreferrer">{{ page.mTGetStartedNow }}</a>
  </div>
</div>
ann0see commented 11 months ago

Probably not great, but still better than missing context.

Best case: all the content between the p tags gets translated.

Actually, I don't think it needs AI at all. If it understands HTML it should work fine.

ignotus666 commented 11 months ago

Actually, I don't think it needs AI at all. If it understands HTML it should work fine.

This is the thing: po4a extracts translatable text, it gets translated, and then it inserts it back in at the relevant locations. Say you give it rules to not break segments where tags start and end, but at periods, exclamation marks etc. (probably doable). So you get e.g.:

Check out the documentation and consider the troubleshooting section!

instead of:

  Check out the 
documentation
 and consider the 
troubleshooting section
!

Ok, so I translate my nice coherent paragraph into Spanish:

¡Consulta la documentación y presta atención a la sección de resolución de problemas!

But now po4a has no way to know where the hell each of those words fit into this:

Check out the <a href="wiki/Getting-Started" target="_blank" rel="noreferrer">documentation</a> and consider the <a href="wiki/Client-Troubleshooting"
        target="_blank" rel="noreferrer">troubleshooting section</a>!

and the position of each word is very important. Without an actual understanding of language and its meaning (hence the AI comment), there's no way it can do that. So we either tell po4a to treat it like a .doc and get a massive paragraph with all the tags, and risk translators accidentally adding/deleting spaces or other characters that cause the Index page - and therefore the whole website, because it drags it down - to fail (which if I remember correctly was another reason the current po4a xml module is used, which filters out tags and their content), or we leave it as it is, which I'll agree is imperfect.

ann0see commented 11 months ago

Or we convert the translatable sections to markdown and include it.

ignotus666 commented 11 months ago

Or we convert the translatable sections to markdown and include it.

I had a go at this, but it doesn't work because markdown text in a html file isn't recognised as such, and putting it in a separate Include.md file doesn't seem to change that fact:

Screenshot at 2023-07-26 08-35-18

Or maybe I'm missing a trick?

ann0see commented 11 months ago

I believe it works like this: https://stackoverflow.com/questions/15917463/embedding-markdown-in-jekyll-html#23384161

ann0see commented 11 months ago

Otherwise we could do the same as we do with the header: use variables and edit a yml file