Closed ynojima closed 11 months ago
Hey, thanks for all the pointers.
Considering the relative complexity of loading the translations, I'm wondering if we shouldn't handle #30 first... Then we might not even have to care about what's translated and what is not, we'd just index the rendered guide?
The main issue would be with indexing metadata typically found in the Asciidoc source but not necessarily in the rendered content, such as :topics:
or :categories:
. But I'm not sure these get translated?
Am a right to believe the docs
branch is where I should look for rendered pages?
Oh, if you have plan to index rendered html, I agree with you that the switch should be done first.
:topics:
and:categories:
Do you mean those in the YAML front-matter? It is not translated.
Am a right to believe the docs branch is where I should look for rendered pages?
Yes, the rendered site is saved in the docs
directory of the docs
branch.
https://github.com/quarkusio/ja.quarkus.io/tree/docs/docs
Do you mean those in the YAML front-matter? It is not translated.
Yes. Alright, then indexing the HTML should work fine.
Yes, the rendered site is saved in the
docs
directory of thedocs
branch. https://github.com/quarkusio/ja.quarkus.io/tree/docs/docs
Gotcha, thanks.
@marko-bekhta some comments if you're going to work on this:
QuarkusIOSample
and the readme (in particular for development environments) accordingly. It will probably be a pain though :/quarkus.yaml
gets translated, so... handling titles and summaries might be challengingGuide
instance per language or put everything in the same entity using dedicated data structure to account for internationalization (e.g. I18nData<T>
with properties public T en; public T es; public T jp;
). I don't know what's best.AlternativeBinder
.@Embedded
@I18nFullTextField(
en = @Localization(analyzer = ...),
es = @Localization(analyzer = ...),
jp = @Localization(analyzer = ...)
)
I18nData<String> summary;
@Embedded
@I18nFullTextField(
name = "fullContent",
bridge = ...,
en = @Localization(analyzer = ..., searchAnalyzer = ...),
es = @Localization(analyzer = ..., searchAnalyzer = ...),
jp = @Localization(analyzer = ..., searchAnalyzer = ...)
)
I18nData<InputProvider> fullContentUrl;
The annotation processor would take care of explicitly mapping every sub-property of I18nData
(en
, es
, jp
, ...) to a dedicated per-language field, with localization metadata (per-language analyzer) applied, and some prefix or suffix applied to the field name (summary_fr
or fr.summary
or whatever)
quarkus.io has localized sites(es.quarkus.io, pt.quarkus.io, cn.quarkus.io, and ja.quarkus.io). If
search.quarkus.io
supports query for localized sites, it would be really helpful for users visiting localized sites.For ja.quarkus.io, https://github.com/quarkusio/ja.quarkus.io repository contains .adoc.po files under the https://github.com/quarkusio/ja.quarkus.io/tree/main/l10n/po/ja_JP directory. Each .adoc.po file path corresponds to the location of the original .adoc file path in the upstream https://github.com/quarkusio/quarkusio.github.io repository. The .po files contain entries of original text and localized text. If an entry is marked with "fuzzy", it is not reviewed by human, not published to the localized site, so the original english text should be indexed instead.
For example, since the following entry is marked with "fuzzy", https://github.com/quarkusio/ja.quarkus.io/blob/main/l10n/po/ja_JP/_versions/3.2/guides/telemetry-micrometer.adoc.po#L15-L20 the original text is published to the locaized site: https://ja.quarkus.io/version/3.2/guides/telemetry-micrometer
To load .po file, "org.fedorahosted.tennera:jgettext" may be a good library candidate. https://central.sonatype.com/artifact/org.fedorahosted.tennera/jgettext