tlienart / Franklin.jl

(yet another) static site generator. Simple, customisable, fast, maths with KaTeX, code evaluation, optional pre-rendering, in Julia.
https://franklinjl.org
MIT License
962 stars 114 forks source link

Feature Request: Support some i18n, e.g. something like polyglot #888

Open schlichtanders opened 3 years ago

schlichtanders commented 3 years ago

Hi all,

I am facing difficulties applying internationalization / localization within Franklin.jl In Jekyll there is a lovely plugin called polyglot. Would it be possible to support something similar for Franklin.jl?

thanks a lot

schlichtanders commented 3 years ago

@tlienart you seem to have used i18n in your projects. Can you share your current approach?

tlienart commented 3 years ago

Hello Stephan, the bad: there's no baked in support nor a plugin that you can directly use.

The good: it's reasonably easy to do this manually, and I think I'd like users to try a few versions until we find one that seems to work well that we can make available to other users via some package or baked in functionality.

Below are my thoughts when considering i18n for another user a while back.

i18n with Franklin

Let's say you want your base language to be EN and have some FR for some pages (potentially all but in some cases people can only be bothered to translate some). Let's say that your base url is base (e.g. tlienart.github.io or tlienart.github.io/project) what I'm suggesting is that you'd have:

* base/, base/page1/, base/page2/ ...
* base/fr/, base/fr/page1, base/fr/page2/ ...

so far so good, you could also want base/en/... etc but it's just a small extension of the below. Adding a flag button in your layout that takes the current URL and links to another page with the relevant /fr/ injected is easy as well.

Let's say now that you have the post page1.md.

Manual

You can already do the above by having a copy page1_fr.md and have a slug that indicates the URL for the page e.g.:

page1.md

+++
author = "The Oracle"
+++
# The Title
This is a sentence.

page1_fr.md

+++
author = "The Oracle"
slug = "fr/page1"
+++
# Le Titre
Ceci est une phrase.

This is fine if you need to do it just for one or two pages and are happy with them potentially diverging over time (e.g. the landing page). Of course there are two disadvantages (1) you need to maintain the slug if you want to change the file name and (2) the translation is not in the same file as the original meaning that maintenance if you want to keep the two files 1-1 is that little bit more annoying.

I'm not familiar with the plugin you mention but my impression is that the result would be similar to the above.

With blocks

It might be easier to keep the original and the translation(s) in the same file to make maintenance easier. The idea here is then that a post consists of blocks of text and that, each time, you'll provide several versions of the text. This would require a bit more work to fully work but the gist should be clear:

page1.md

+++
author = "The Oracle"

title = (en="The Title", fr="Le Titre")

block1 = (en="""
# This is the first block
With a first sentence.
""", fr = """
# Ceci est le premier bloc
Avec une première phrase
""")

block2 = (en="""
...
""", fr="""
...
""")

# more blocks

all_blocks = [block1, block2, ...]
+++

{{generate all_blocks}}

So you have to maintain a bunch of blocks which are close to one another, using blocks is convenient for maintenance (smaller bits of text) and the only thing left is to have the function generate_page assemble the full markdown for each language and generate the relevant page. Writing that function generate is not hard, in fact it could simply itself generate the file page1_en.md and page1_fr.md each with the relevant slug, this is probably the easiest way, so you'd have the "parent" (page1) which generates the children page.

I have to go but let me know what you think and if you end up trying one of the version or would like something else

schlichtanders commented 3 years ago

Thank you a lot for your huge help.

I myself would like the second approach, because everything is more self-contained. You could also easily implement a fallback mechanism into the generate function.

Three questions which pop up:

  1. Switching the language

    Adding a flag button in your layout that takes the current URL and links to another page with the relevant /fr/ injected is easy as well.

    Do you have a snippet illustrating how this could go?

  2. header and footer If I understood Franklin then you would have to add a couple of {{ispage en/}} {{insert head_en}} {{end}} in addition of creating the individual language versions.

  3. Interaction with other Franklin.jl features As this approach kind of uses Julia for everything, the doubt comes up, whether all other Franklin features would still be able to work

schlichtanders commented 3 years ago

I came up with another idea as an evolution of your second suggestion with namedtuples (en = "Title", fr = "Le Titre").

It would be nice if Franklin.jl had an option for automatically splitting files with such i18n-namedtuples out to multiple files. We can easily make the interface stable by providing our own wrapper i18n(en = "Title", fr = "Le Titre").

This could go like follows (assignments are just examples):

  1. check if the config flag i18n_enable = true is set to true
  2. read config i18n_languages = ['en', 'fr', 'de']
  3. read config i18n_default = 'en'
  4. go through all pages. For each page, let's call it page_active, and for each language in i18n_languages, let's call it i18n_active, do
    1. parse all variables in the page
    2. create i18n-conform variables by choosing either i18n_active or falling back to i18n_default, or if no i18n(en=...) is given but a plain value, take the plain value.
    3. create a new page pages/{{i18n_active}}/{{page_active}} with the respective variables
    4. change references to other pages similarly to pages/{{i18n_active}}/other_page
    5. insert redirects as appropriate so that the i18n_default gets visited by default
    6. do page parsing as normal

The only thing which would be needed in addition is the little helper snippets about how to switch between the languages.

One particular nice thing about this approach is that you can easily extend existing Franklin sites by just changing a couple of parameters to i18n(en = ...) parameters, plus setting the three new configs i18n_enable, i18n_languages, i18n_default. All the rest should then work out of the box.

As of now, I would actually prefer adding such functionality to Franklin.jl instead of writing a generate function. Sounds about the same amount of work, but the proposed solution is more intuitive and should easily interact with other extensions as well.

What do you think?

tlienart commented 3 years ago

sorry if I missed something in your answer but where do the translations live? do you have separate files with translations?

schlichtanders commented 3 years ago

yes, I thought about that:

create a new page pages/{{i18n_active}}/{{page_active}} with the respective subset of variables

so like you suggested for generate, the preparser would just create all the separate translation files

schlichtanders commented 3 years ago

Of course, alternatively you don't have to create the intermediate translation markdown files, but could directly create the translation html files under __site/ respectively.

That would just be an alternative implementation strategy with the same effect, but might be simpler to implement.

tlienart commented 3 years ago

could you write what a "base markdown page" would look like in your original idea? specifically:

just so I understand where you put the text in language_A, language_B, ...

schlichtanders commented 3 years ago

sure.

given i18n_enable = true and i18n_languages = ['en', 'de']

a base markdown page say pages/simplepage.md

+++
Title = i18n(en="Title", de="Titel")
Body = i18n(en="My Text", de="Mein Text")
Constant = 42
+++
{{ include default_body }} 

would get translated to pages/en/simplepage.md

+++
Title = "Title"
Body = "My Text"
Constant = 42
+++
{{ include default_body }} 

and pages/de/simplepage.md

+++
Title = "Titel"
Body = "Mein Text"
Constant = 42
+++
{{ include default_body }} 
schlichtanders commented 3 years ago

As an additional feature you could combine this approach with the possibility of providing translation markdown pages directly, like pages/en/simplepage2.md.

Of course, this can stay as a task for later. Just wanted to point out it would be integratable.

In total the functionality would be impressively similar to polyglot, even a bit more intuitive, thanks to i18n(en=...), and the amount of work is not huge. Easy to understand and flexible, such a i18n support would fit quite well to Franklin it seems to me.

tlienart commented 3 years ago

Thanks for your input, I think this all seems reasonable, some notes that we don't necessarily need to solve right now:

+++
txt = """
foo

const x = 5

bar
+++

should properly highlight the code in e.g. Atom or VSCode.

all in all I think what you suggested is very close to the generate story except the generate would be on Franklin rather than on the user + some good ideas around the config stuff.

I'll need to think a bit about this, can't promise delivery for it as I'm working on other stuff for Franklin which have higher priority (this is also why I tried to give you paths for how you could do stuff right now so that you're not blocked) but otherwise I think there are good ideas here and it would be a nice addition.

schlichtanders commented 3 years ago

Just understood that Franklin not necessarily puts everything under pages/ top folder. Hence using suffix like ..._en.md as you originally suggested seems more appropriate.

schlichtanders commented 3 years ago

Could you share a snippet about how to switch the language?

Adding a flag button in your layout that takes the current URL and links to another page with the relevant /fr/ injected is easy as well.

tlienart commented 3 years ago

Let's say you have a topnav and that it's described in _layout/head.html, somewhere appropriate you'd put something like:

{{flags}}

with the understanding that this will inject something like

<a href="/en/page/"><img src="flag_en.png" /></a>
<a href="/de/page/"><img src="flag_de.png" /></a>

if there's several translations of that page available. Here's a sketch of a function that would do some of this:

function hfun_flags()
    rpath = locvar(:fd_rpath) # something like /path/to/page
   # some logic here to check whether there's, at an appropriate location, some translated version
   # corresponding to rpath (this will be different based on what you do)
   # ...
   has_translation = true
   if !has_translation
        return ""
    end
    s = """
        <a href="$en_url"><img src="flag_en.png" /></a>
        <a href="$de_url"><img src="flag_de.png" /></a>
        """
    return s
end

I hope that gives you the idea

schlichtanders commented 3 years ago

yes, that helped, thank you very much

schlichtanders commented 3 years ago

I was motivated to go through the source code and look what would be the easiest way to add this. Please remark if I got something wrong.

Key Assumption

I assume that the config.md is parsed first, so that as soon as a normal page get's processed, we already know the global config

What needs to be added

Part 1

At the central method process_file_err where the final out path is computed, we would need to add the following

  1. loop over all i18n_languages, use a fallback value, say i18n_languages = [nothing] in case i18n_languages is not defined
  2. set a internal local variable i18n_active which specifies the current element of i18n_languages
  3. if i18n_active !== nothing change the outputpath outp by prepending i18n_active

Part 2

At the method set_vars! add the following

  1. check if the value is of the special type i18n
  2. if so, use something like value = value[something(i18n_active, i18n_default)] to grab the currently active language, or if that value does not exist, grab the default language (the default language could be configured or be the first language used in the i18n)

Part 3

At the method link_fixer add the following

  1. if i18n_active is not the fallback, add it as a prefix to all the paths in the html

That is it. Very concise, easy to implement, and with full compatibility with everything else so far as I can see. @tlienart Can you check whether you spot any missing pieces or mistakes?

schlichtanders commented 3 years ago

Slight update:

I think it would make even more sense if the loop happens before config.md is loaded, so that also in the global config something like myvariable = i18n(en="english", fr="française") would behave just as if you would have written myvariable = "english" for the en case and myvariable = "française" for the fr case.

So the loop would be here right before the global vars gets defined.

The information i18n_active could be stored as a global variable of the specific loop run.


This would imply that also fd_loop somehow needs to loop through the i18n_languages and construct a i18n specific global dict in case only specific watched files changed.

tlienart commented 3 years ago

thanks for the suggestions; I won't work on this for 0.10.* but might be interested in picking this up for the next release. One question that is still unclear to me is how you organise files i.e.: where do the translations actually live. Say you have a page A.md with some English text on it, do you keep all translations inside it? next to it? what if sometimes there's a translation and sometimes there isn't?

edit: it should be said that I do not like the way polyglot does it because it seems hard to maintain.

schlichtanders commented 3 years ago

edit: it should be said that I do not like the way polyglot does it because it seems hard to maintain.

I hope I could show you that the changes look indeed very simple to maintain. The idea of being able to use something like myvariable = i18n(en=..., fr=...) was also your favourite way to go forward because then the page structure is clearly shared among translations.


where do the translations actually live.

The translations would live normally in the __site folder. The key step for this is the Part 1, step 3 above:

  1. ... change the outputpath outp by prepending i18n_active So really just the prefix

Say you have a page A.md with some English text on it, do you keep all translations inside it? next to it? what if sometimes there's a translation and sometimes there isn't?

the english version would go to en/A/index.html, the french version to fr/A/index.html.


I won't work on this for 0.10.*

I am quite interested in this feature and would be motivated to try implementing it myself. In another issue you wrote that you are currently refactoring quite a lot of Franklin. Of course I wouldn't like to implement a feature only that it becomes stale because of orthogonal refactoring.

Could you give me a ping if you think it is kind of safe to implement such i18n?


Another future direction would be to build a plugin system which is powerful enough so that such i18n could be implemented within it. But I guess this would be way more challenging and hence better something for later.

tlienart commented 3 years ago

I hope I could show you that the changes look indeed very simple to maintain.

I'm not talking about the code changes in Franklin; I'm talking about how easy it is for a user to maintain translations (i.e. how cumbersome it is to use). This is related with my question of where do the translations live:

I still don't understand where the original files live (let's not worry about paths etc, this is all trivial), if you have a post with I am Winnie the Pooh in A.md and you want to also have that post in German, where do you but Ich bin Winnie Puuh? in A.md? in A_de.md?


feel free to try a PR around this of course; I will gladly help review it; & I don't think the refactoring will be orthogonal to what I believe you're suggesting.

schlichtanders commented 3 years ago

if you have a post with I am Winnie the Pooh in A.md and you want to also have that post in German, where do you but Ich bin Winnie Puuh? in A.md? in A_de.md?

The way suggested here would be to have only one A.md which has as its content

+++
variable = i18n(en="I am Winnie the Pooh", de="Ich bin Winnie Puuh")
+++
...

You should also be able to turn off the i18n processing somehow and write the two files yourself, like {{franklin project root}}/de/A.md and {{franklin project root}}/en/A.de or something similar, but this is not the main target of this pullrequest, because you can do that already now.