mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.56k stars 9.98k forks source link

Replace `webl10n` with Fluent #10075

Closed timvandermeij closed 1 year ago

timvandermeij commented 6 years ago

The webl10n library is unmaintained and Fluent is actively maintained, so it may be beneficial to switch to that system. It must be able to somehow read the current locale files though.

timvandermeij commented 6 years ago

@flodolo Could you comment on how this would work with the existing .properties files? If we need to port them to Fluent syntax, I'm worried that languages that are not yet ported stop working unless the syntax is backwards-compatible. Moreover, will it go well upstream if we change the English file to Fluent syntax, i.e., can the localization systems handle this new syntax?

flodolo commented 6 years ago

@flodolo Could you comment on how this would work with the existing .properties files? If we need to port them to Fluent syntax, I'm worried that languages that are not yet ported stop working unless the syntax is backwards-compatible. Moreover, will it go well upstream if we change the English file to Fluent syntax, i.e., can the localization systems handle this new syntax?

We have scripts to migrate strings from .properties to .ftl https://firefox-source-docs.mozilla.org/intl/l10n/l10n/fluent_migrations.html

We only need the .ftl file that will be used in the product.

We will help with migrating those, even writing the migration recipe if needed, once the FTL is ready.

Snuffleupagus commented 6 years ago

It's probably also worth mentioning that the GENERIC vs MOZCENTRAL (i.e. built-in) viewers handle localization slightly differently. Please see external/webL10n/l10n.js respectively extensions/firefox/tools/l10n.js, where the latter is backed by code in both firefoxcom.js and PdfStreamConverter.jsm (making this whole process slightly less straightforward than one would want).

flodolo commented 2 years ago

I was going to file an issue about this, forgot there was one already open (cc @eemeli).

We're getting close to remove all DTDs from mozilla-central, at some point we might deprecate even .properties (one first step would be to deprecate plural strings).

So, it would be great to look into moving PDF.js away from webL10n. Most of the strings should be easy to migrate, while others (custom plurals), will need to be retranslated from scratch.

marco-c commented 2 years ago

@Snuffleupagus @timvandermeij any idea how much work this is? It'd be nice to finally get us off this super old library.

Snuffleupagus commented 2 years ago

We're getting close to remove all DTDs from mozilla-central, at some point we might deprecate even .properties (one first step would be to deprecate plural strings).

Just a FYI: Given that we're using a custom version of the webL10n-library for the built-in Firefox PDF Viewer, I don't believe that we're directly affected by any of this deprecation. Unless there's also a plan to remove the StringBundle in the process, since that's being used here.

any idea how much work this is? It'd be nice to finally get us off this super old library.

It's difficult for me to tell, since it's been quite a while since I last looked at any of this and Fluent may also have changed slightly in the years since.[1] (I cannot recall how well it worked, but I might still have some some very basic patches laying around somewhere.)

Note that as mentioned in https://github.com/mozilla/pdf.js/issues/10075#issuecomment-421161167 things might need to look slightly different for various PDF.js build targets, which may end up complicating things. For one thing, in the GENERIC viewer the webL10n-library handles finding/loading of the l10n-files whereas it appears (as far as I understand) that with Fluent we'd need to handle that ourselves!?

Also, it would seem quite unfortunate (from a file-size perspective) if we'd need to bundle a bunch of the Fluent JS-libraries into the Firefox built-in PDF Viewer. Given that this code already exists in mozilla-central, it would seem ideal if for the MOZCENTRAL viewer we'd be able to access the needed Fluent functionality of the browser directly from the viewer (e.g. exposing that to code running at resource://pdf.js/web/viewer.html or similar).


[1] One of the reasons that I stopped looking at this was that it didn't seem all the pressing back then, and also that I couldn't (easily) find all that many good examples to base an implementation upon.

eemeli commented 2 years ago

Note that as mentioned in #10075 (comment) things might need to look slightly different for various PDF.js build targets, which may end up complicating things. For one thing, in the GENERIC viewer the webL10n-library handles finding/loading of the l10n-files whereas it appears (as far as I understand) that with Fluent we'd need to handle that ourselves!?

The JS implementation of @fluent/dom does require its user to handle fetching the localization sources separately from Fluent. The Rust implementation that Firefox internally uses fetches the localization files given their path.

Also, it would seem quite unfortunate (from a file-size perspective) if we'd need to bundle a bunch of the Fluent JS-libraries into the Firefox built-in PDF Viewer. Given that this code already exists in mozilla-central, it would seem ideal if for the MOZCENTRAL viewer we'd be able to access the needed Fluent functionality of the browser directly from the viewer (e.g. exposing that to code running at resource://pdf.js/web/viewer.html or similar).

Then it's pretty good that this is already the case, yes? Document localization is available for resource: URIs. To see this in effect, apply this patch to m-c and observe the results:

--- a/toolkit/components/pdfjs/content/web/viewer.html
+++ b/toolkit/components/pdfjs/content/web/viewer.html
@@ -31,6 +31,7 @@ See https://github.com/adobe-type-tools/cmap-resources
 <script src="../build/pdf.js"></script>

     <link rel="stylesheet" href="viewer.css">
+    <link rel="localization" href="toolkit/about/aboutAbout.ftl">

   <script src="viewer.js"></script>

@@ -300,7 +301,7 @@ See https://github.com/adobe-type-tools/cmap-resources
                 </div>
                 <span id="scaleSelectContainer" class="dropdownToolbarButton">
                   <select id="scaleSelect" title="Zoom" tabindex="23" data-l10n-id="zoom">
-                    <option id="pageAutoOption" title="" value="auto" selected="selected" data-l10n-id="page_scale_auto">Automatic Zoom</option>
+                    <option id="pageAutoOption" title="" value="auto" selected="selected" data-l10n-id="about-about-title">Automatic Zoom</option>
                     <option id="pageActualOption" title="" value="page-actual" data-l10n-id="page_scale_actual">Actual Size</option>
                     <option id="pageFitOption" title="" value="page-fit" data-l10n-id="page_scale_fit">Page Fit</option>
                     <option id="pageWidthOption" title="" value="page-width" data-l10n-id="page_scale_width">Page Width</option>
Screenshot 2022-09-09 at 8 22 08
Snuffleupagus commented 2 years ago

Then it's pretty good that this is already the case, yes? Document localization is available for resource: URIs. To see this in effect, apply this patch to m-c and observe the results:

I don't really understand how this helps us, given the existing localization abstraction (see the L10n-classes) that's being used throughout the PDF.js viewer!?

Ideally we'd simply switch out webL10n for Fluent without having to make changes everywhere in the viewer, also keeping in mind that the GENERIC viewer still must work and that things will be hard to develop/maintain if the GENERIC/MOZCENTRAL localization-handling diverges (a lot) here.

eemeli commented 2 years ago

Ideally we'd simply switch out webL10n for Fluent without having to make changes everywhere in the viewer

That's certainly doable. As far as I can tell, pdf.js is currently localising strings in two ways:

  1. DOM localization, using data-l10n-id and data-l10-args attributes on elements. Fluent uses the exact same syntax for these, so they could remain exactly as-is.
  2. Localization from JS through the l10n instance, which is implemented either by MozL10n or GenericL10n, providing a set of async methods for localization and locale info.

So a migration to Fluent should only require the following changes:

  1. Replace the MozL10n implementation with one that relies on the Fluent provided by the browser.
  2. Replace the GenericL10n implementation with one that relies on the JavaScript @fluent/bundle and @fluent/dom packages.
  3. Transform the .properties sources to FTL. We have tooling in m-c that should help automate most of this.
  4. In m-c, update the locales/jar.mn config for the locale files.

As a nice-to-have, the message ids could be renamed during the migration. While the current ones should all technically work in Fluent, the linter will need to have a bunch of exceptions added for the _ underscores.