google / mdbook-i18n-helpers

Translation support for mdbook. The plugins here give you a structured way to maintain a translated book.
Apache License 2.0
127 stars 25 forks source link

Write `mdbook` renderer with strong templating system #70

Closed mgeisler closed 10 months ago

mgeisler commented 1 year ago

Currently, the mdbook templates are very limited: https://github.com/rust-lang/mdBook/issues/2016. I believe this is on purpose to prevent templates from becoming too unwieldy.

However, a limited template engine only works when it's easy to modify the logic behind the templates — which means modifying the upstream mdbook code. There are a number of features which we would like to implement, but which require modifying the generated HTML:

The common theme in these issues is that they require us to inject more data into the template (a list of languages, a canonical URL) and that we need to loop over these values and extract various parts.

A stronger mdbook renderer would allow us to do this inside the theme itself. The renderer should be generic and expose the same values to the theme as the HtmlHandlebars renderer does today — plus data configured in the book.toml file. This would be how users will inject more data.

The templating engine should then allow users to

Ideally, the new renderer would be very small: it converts the chapter content from Markdown to HTML and sends this plus the chapter information to the template engine. All of the static file copying in HtmlHandlebars would be replaced by the theme.

Template Engine

Take a look at https://lib.rs/template-engine and https://crates.io/categories/template-engine. I would look at askama and tera, both of which are inspired by Django. Since we target offline-rendering, we don't need the fastest engine in the world — we need something which is flexible and extensible without having to touch the mdbook renderer all the time.

Ideally, we would prototype a set of template helper functions in a theme first (in Comprehensive Rust, most likely). If they become widely useful, we will move those helper definitions to a "standard library" for the new renderer. This way others can reuse the helpers in their own themes.

sakex commented 11 months ago

Exploration of the issue

I have been investigating the issue. Here are my observations:

What MDbook's HTML renderer does

Mdbook's default HTML renderer does more than just translating Markdown to HTML. It also handles setting variables, rendering special elements like code blocks and rendering templates. Implementing a new renderer from scratch would be a slightly difficult task, but not that difficult (I'd say 1-2 months of 20% effort to reach feature parity.) On the other hand, we could continue working with the default HTML renderer as I'll discuss furtther down.

Writing a renderer from scratch

First, let's discuss how to write a new renderer. We could use approximately the same approach as MDbook. First transforming the MarkDown into HTML with wooorm/markdown-rs and then adding the generated HTML to a template that we would render with https://lib.rs/crates/tera (I've investigated both tera and askama but Tera has superior documentation and a better API for working with dynamic files at runtime.) This would give us full control over the renderer and would allow us to add build-time variables and build new components and features directly into the renderer; with the caveat that it would take some time to implement and test.

Consuming the default HTML renderer's output

On the other hand we could skip writing a new renderer by using the generated HTML output from the default renderer. We could simply read every HTML file in the output directory and re-render them with another template engine. This would be easy to do with the caveat that it would require 2 templating engines to cohabit. The engines would not be able to share any syntax and the template would be hard to reason about. We could also make a PR to mdbook to add an option to skip handlebar's rendering, which would allow us to work with a single template engine.

mgeisler commented 11 months ago

Hey @sakex, thanks a lot for looking at this!

Mdbook's default HTML renderer does more than just translating Markdown to HTML. It also handles setting variables, rendering special elements like code blocks and rendering templates.

There is for sure a lot going on in hbs_renderer.rs. I get the clear impression that much of it should be handled outside of Rust: all the post-processing done via regexps for example. The many write_file calls with particular CSS, JavaScript, and font files should also be handled by the theme.

I don't know how long it would take to replace this — I would start with a fresh slate and build up something which duplicates the existing functionality. There are just a few CSS files and a bit of HTML needed to start with.

First transforming the MarkDown into HTML with wooorm/markdown-rs

Small details: mdbook and other tools such as rustdoc uses pulldown-cmark, so we should do the same.

Consuming the default HTML renderer's output

Okay... I think I get what you mean: You're suggesting that we would not use a mdbook renderer, but instead write a stand-alone tool which does the following:

let tera = Tera::new("templates/**/*.html")?;
let mut context = Context::new();
// Add static data for the book, we can parse `book.toml` for this.
for path in glob("book/html/**/*.html") {
    // Configure `context` for this path.
    let content = fs::read(path?)?;
    tera.render(path?, &context)?;
}

We would have to escape all {{ foo }} meant for Tera in the Handlebars template, but this would effectively allow us to inject stuff into the generated HTML.

I see how that would be much less work... the template can apparently use {{{{raw}}}} ... {{{{/raw}}}} around the Tera blocks.

Question: can we do what mdbook-i18n does and wrap this up into a mdbook renderer? It would first call the regular renderer and then do the post-processing, all in a single mdbook build step.

We should also think about how this will be used and maintained:

sakex commented 11 months ago

Okay... I think I get what you mean: You're suggesting that we would not use a mdbook renderer, but instead write a stand-alone tool which does the following:

Yes exactly, thanks for adding the pseudo code, it's exactly what I had in mind. It would still be a renderer though, because a renderer is just a command that is called by mdbook that receives the context as stdin. If you look at the implementation of mdbook-pdf, they do basically the same thing. They get the context from stdin and then traverse the directory to find the print.html page that they load. We could do the same thing, no need to hardcode a path and we would just go through the html files and call tera on them.

Question: can we do what mdbook-i18n does and wrap this up into a mdbook renderer? It would first call the regular renderer and then do the post-processing, all in a single mdbook build step.

Yes we could with both approaches because both would involve creating a renderer in the end.

sakex commented 11 months ago

We should also think about how this will be used and maintained:

Consuming the html renderer's output has the advantage that it would not involve any migration burden, but only allow adding new features.

For this question, I don't think either approach is better, because they could always break something upstream that would require updating our renderer.

The tradeoff is between ease of development and control. I would favour going with the option of consuming the HTML output at first, because it would allow us to add new features faster. We could always rewrite the rest of the rendering logic in the future if the need arises.

What do you think?

mgeisler commented 11 months ago

The tradeoff is between ease of development and control. I would favour going with the option of consuming the HTML output at first, because it would allow us to add new features faster. We could always rewrite the rest of the rendering logic in the future if the need arises.

Yeah, I think I agree now!

I was confused at first, but now I see it as:

  1. Use mdbook to output a lot of template files (they happen to be mostly HTML, but that's okay!),
  2. Process those template files.

It seems fairly simple and I like that it's easy to upgrade to new mdbook versions: you just migrate over your index.hbs file (like we already do).

The end goal (for me) is to be able to enable a new language by editing something else than the index.hbs file :smile: I expect that I can edit a table in book.toml and then links will automatically be generated between all translations as needed.

Now, it would be really cool if we could extend the default index.hbs file with more nested templates. Something which would allow us to move our customization out of index.hbs and rely solely on the one from upstream.

sakex commented 11 months ago

The end goal (for me) is to be able to enable a new language by editing something else than the index.hbs file 😄 I expect that I can edit a table in book.toml and then links will automatically be generated between all translations as needed.

With the consuming the HTML output approach, we would still need to edit the index.hbs file just to tell the renderer where to put the links in the rendered HTML. But at least we would not need to play with Javascript in any way.

Writing a new renderer from scratch would allow us to have a default template with the links and we wouldn't need to edit anything inside of comprehensive-rust.

Now, the question comes down to user experience.

User experience

The feature we want is basically to add a new custom component (the language picker.) So what if our renderer looked a bit like JSX and instead of parsing templating syntax, it replaced html tags starting with a capital letter with custom components.

For instance, now for the language picker, we have

  <div class="right-buttons">
      <button id="language-toggle" class="icon-button" type="button"
              title="Change language" aria-label="Change language"
              aria-haspopup="true" aria-expanded="false"
              aria-controls="language-list">
          <i class="fa fa-globe"></i>
      </button>
      <ul id="language-list" class="theme-popup" aria-label="Languages" role="menu">
        <li role="none"><button role="menuitem" class="theme">
... many more lines of HTML and javascript

what if we could just write:

<LanguagePicker />

and let our renderer replace <LanguagePicker /> with the HTML above and also fill the languages based on values provided in Book.toml?

Then, our syntax would not collide with handlebars and we would have an expressive way to represent custom components.

simonsan commented 11 months ago

The end goal (for me) is to be able to enable a new language by editing something else than the index.hbs file 😄 I expect that I can edit a table in book.toml and then links will automatically be generated between all translations as needed.

I'm asking myself why the book.toml even has the

language = "en"
multilingual = false

fields?

As a user, I would like to have a system, where I can enable the whole multilingual setup by just setting it to multilingual = true in the config. And give an array of languages that should be rendered, so that I can easily configure in that way.

The data, where the translations come from, should be extracted abstracted over, so different systems can be built around that. As I haven't looked into it, can someone give a brief remark, why that functionality isn't supported upstream and what would be missing?

mgeisler commented 11 months ago

As I haven't looked into it, can someone give a brief remark, why that functionality isn't supported upstream and what would be missing?

My understanding from https://github.com/rust-lang/mdBook/issues/5 is that this was never just never implemented — despite many projects being somewhat blocked on it (https://github.com/rust-lang/book/issues/464).

The multilingual field is never read by mdbook itself.

simonsan commented 11 months ago

As I haven't looked into it, can someone give a brief remark, why that functionality isn't supported upstream and what would be missing?

My understanding from rust-lang/mdBook#5 is that this was never just never implemented — despite many projects being somewhat blocked on it (rust-lang/book#464).

The multilingual field is never read by mdbook itself.

Thanks for the briefing! Would it be viable/wanted to make changes upstream to add that support? Or would it be an option to fork mdbook for that purpose e.g. into mdbook-i18n and then contribute that changes upstream? So we could run it as an alternative for a while, eliminating most of the obvious bugs, and then make other people adapt to by merging changes upstream and releasing a new version that might not even contain breaking changes. As we could just implement it then run as an alternative renderer?

mgeisler commented 11 months ago

Thanks for the briefing! Would it be viable/wanted to make changes upstream to add that support?

I would love to see this supported upstream somehow. I first created https://github.com/rust-lang/mdBook/pull/1864 to try and add more infrastructure to the mdbook command. I then realized that the plugin system allows me to do this without modifying mdbook and this lead to the tools you find in this repository.

Or would it be an option to fork mdbook for that purpose e.g. into mdbook-i18n and then contribute that changes upstream? So we could run it as an alternative for a while, eliminating most of the obvious bugs, and then make other people adapt to by merging changes upstream and releasing a new version that might not even contain breaking changes. As we could just implement it then run as an alternative renderer?

That would all be possible — we are effectively doing this with this project :smile: Having our own plugins gives a lot of flexibility, probably more than what we would have if it was part of the upstream project.

Perhaps a good step forward would be for mdbook to remove the multilingual field from the upstream code. It's confusing that it's there right now. Instead, let people use plugins like these, assuming we can make them easier to use. For that, I think we only need a way for plugins to inject more templates into the existing template system. There is a head template right now, but we would probably need a few more.

@sakex is on that right now, so I hope we'll soon be able to let mdbook build do everything nice in one command.

mgeisler commented 11 months ago

For that, I think we only need a way for plugins to inject more templates into the existing template system. There is a head template right now, but we would probably need a few more.

By the way, for #35, the custom head template isn't enough since the template system is too weak: we need each HTML page to link to the other language versions — and for that we need access to the current URL and we need some string manipulation calls. I don't know how to do that with the handlebars template system used today.

mgeisler commented 11 months ago

what if we could just write:

<LanguagePicker />

and let our renderer replace <LanguagePicker /> with the HTML above and also fill the languages based on values provided in Book.toml?

I think it all depends on what kind of template system we can find — if there is a nice Rust library that can expand such custom tags, then that could be super nice indeed. Probably nicer than the {{ foo bar }} syntax used by the Django-inspired tools.

In general, I'm hoping for a very simple and flexible system. Something that will let me inject arbitrary data into book.toml and immediately make use of it in my templates.

Our current use cases can be seen by comparing mdbook's index.hbs with our index.hbs:

% diff ../mdbook/src/theme/index.hbs theme/index.hbs
--- ../mdbook/src/theme/index.hbs   2023-09-04 14:17:13.776497476 +0200
+++ theme/index.hbs 2023-09-15 11:31:11.445190770 +0200
@@ -11,6 +11,16 @@
         <base href="{{ base_url }}">
         {{/if}}

+        <script async src="https://www.gstatic.com/brandstudio/kato/cookie_choice_component/cookie_consent_bar.v3.js"
+                data-autoload-cookie-consent-bar="true"></script>
+
+        <script async src="https://www.googletagmanager.com/gtag/js?id=G-ZN78TEJMRW"></script>
+        <script>
+          window.dataLayer = window.dataLayer || [];
+          function gtag(){dataLayer.push(arguments);}
+          gtag('js', new Date());
+          gtag('config', 'G-ZN78TEJMRW');
+        </script>

         <!-- Custom HTML head -->
         {{> head}}
@@ -137,6 +147,33 @@
             }
         </script>

+        {{! Move to template code after fixing this issue:
+            https://github.com/google/mdbook-i18n-helpers/issues/70 }}
+        <script>
+          (function () {
+              // See these pages for details:
+              // https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls
+              // https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics
+              let base = "https://google.github.io/comprehensive-rust";
+              {{#if (eq language "en")}}
+              let canonical_href = `${base}/{{ path }}`;
+              {{else}}
+              let canonical_href = `${base}/{{ language }}/{{ path }}`;
+              {{/if}}
+
+              // mdbook gives us a string ending in ".md", we replace it with ".html":
+              canonical_href = canonical_href.slice(0, -"md".length) + "html";
+              if (canonical_href.endsWith("/index.html")) {
+                  canonical_href = canonical_href.slice(0, -"index.html".length);
+              }
+
+              let link = document.createElement("link");
+              link.rel = "canonical";
+              link.href = canonical_href;
+              document.head.appendChild(link);
+          })()
+        </script>
+
         <div id="page-wrapper" class="page-wrapper">

             <div class="page">
@@ -167,6 +204,57 @@
                     <h1 class="menu-title">{{ book_title }}</h1>

                     <div class="right-buttons">
+                        <button id="language-toggle" class="icon-button" type="button"
+                                title="Change language" aria-label="Change language"
+                                aria-haspopup="true" aria-expanded="false"
+                                aria-controls="language-list">
+                            <i class="fa fa-globe"></i>
+                        </button>
+                        <ul id="language-list" class="theme-popup" aria-label="Languages" role="menu">
+                          <li role="none"><button role="menuitem" class="theme">
+                              <a id="en">English</a>
+                          </button></li>
+                          <li role="none"><button role="menuitem" class="theme">
+                              <a id="pt-BR">Brazilian Portuguese (Português do Brasil)</a>
+                          </button></li>
+                          <li role="none"><button role="menuitem" class="theme">
+                              <a id="ko">Korean (한국어)</a>
+                          </button></li>
+                          <li role="none"><button role="menuitem" class="theme">
+                              <a id="es">Spanish (Español)</a>
+                          </button></li>
+                        </ul>
+
+                        <script>
+                          let langToggle = document.getElementById("language-toggle");
+                          let langList = document.getElementById("language-list");
+                          langToggle.addEventListener("click", (event) => {
+                              langList.style.display = langList.style.display == "block" ? "none" : "block";
+                          });
+                          let selectedLang = document.getElementById("{{ language }}");
+                          if (selectedLang) {
+                              selectedLang.parentNode.classList.add("theme-selected");
+                          }
+
+                          // The path to the root, taking the current
+                          // language into account.
+                          {{#if (eq language "en")}}
+                          let full_path_to_root = "{{ path_to_root }}";
+                          {{else}}
+                          let full_path_to_root = "{{ path_to_root }}../";
+                          {{/if}}
+                          // The page path (mdbook only gives us
+                          // access to the path to the Markdown file).
+                          let path = "{{ path }}".replace(/\.md$/, ".html");
+                          for (let lang of langList.querySelectorAll("a")) {
+                              if (lang.id == "en") {
+                                  lang.href = `${full_path_to_root}${path}`;
+                              } else {
+                                  lang.href = `${full_path_to_root}${lang.id}/${path}`;
+                              }
+                          }
+                        </script>
+
                         {{#if print_enable}}
                         <a href="{{ path_to_root }}print.html" title="Print this book" aria-label="Print this book">
                             <i id="print-button" class="fa fa-print"></i>
@@ -178,9 +266,18 @@
                         </a>
                         {{/if}}
                         {{#if git_repository_edit_url}}
-                        <a href="{{git_repository_edit_url}}" title="Suggest an edit" aria-label="Suggest an edit">
+                        {{#if (eq language "en")}}
+                        <a href="{{git_repository_edit_url}}" title="Suggest an edit" aria-label="Suggest an edit"
+                           target="_blank">
                             <i id="git-edit-button" class="fa fa-edit"></i>
                         </a>
+                        {{else}}
+                        <a href="https://github.com/google/comprehensive-rust/edit/main/po/{{language}}.po"
+                           title="Suggest an edit to the translation" aria-label="Suggest an edit to the translation"
+                           target="_blank">
+                            <i id="git-edit-button" class="fa fa-edit"></i>
+                        </a>
+                        {{/if}}
                         {{/if}}

                     </div>

I would like this diff to shrink to near-zero and I would like to get rid of all the JavaScript code.

Having seen this, I realize that I should use the head template to inject the Google Analytics tags! Let me quickly do that :smile:

mgeisler commented 11 months ago

Hi @sakex, you should be aware of the changes made in #70, especially the discussion in the comment I just made. The string manipulation functions in the new template language will be important to keep the existing functionality for when we link to PO files.

sakex commented 11 months ago

Thanks for pointing this out. I don't think it will be an issue for me because the PR is acting on the output of the translation.

mgeisler commented 11 months ago

Thanks for pointing this out. I don't think it will be an issue for me because the PR is acting on the output of the translation.

I was more thinking of the implication for the template system chosen: it will need to let us link to the correct PO file in each output file. I'm imagining this will mean doing some lookup in a mapping of some sort and then output the correct link.

mgeisler commented 11 months ago

Hey @sakex, I realized yesterday that your #84 is actually solving #13 (and not this issue).

In my mind, this issue is more pressing than #13. This issue allows us to implement new and much needed features (see the issues linked just above and in particular #35 and #12). Solving #13 removes a little big of complexity from the publication pipelines — but it doesn't open up new features the way this issue does.

I would therefore suggest writing the new backend for the template system first, get is released, and get it integrated into Comprehensive Rust. It'll help clean up a bunch of boiler plate code and it will us generate the links needed to properly bind translation groups together.

sakex commented 11 months ago

I agree, but it makes development much easier for me to automatically rebuild all translations with mdbook serve. Maybe I can fix the last comments you had and merge it, since anyway we don't need to publish it right away?

mgeisler commented 11 months ago

Maybe I can fix the last comments you had and merge it, since anyway we don't need to publish it right away?

You are right, and even when we publish a new version, we don't have to immediately start using it in our publish.yml workflow for Comprehensive Rust.