octopress / octopress

Octopress 3.0 – Jekyll's Ferrari
MIT License
1.75k stars 175 forks source link

Add multi-language support #84

Open imathis opened 9 years ago

imathis commented 9 years ago

I've recently released octopress-multilingual which has a detailed readme, explaining my initial direction for multilingual support.

imathis commented 9 years ago

So I've been doing some testing on how to get permalinks to work nicely. I'd have to do some monkey patching on Jekyll to make the experience nice. I'd like to be able to add the :lang url template placeholder to the permalink. One problem is that at this point all url template placeholders are not optional, so the Jekyll::URL:generate method will have to be able to strip out nil tokens.

  def generate_url(template)
    @placeholders.inject(template) do |result, token|
      break result if result.index(':').nil? 
      if token.last.nil?
        # Remove leading slash so you don't get urls 
        # with: '//' for an empty token
        result.gsub(/\/:#{token.first}/, '')
      else
        result.gsub(/:#{token.first}/, self.class.escape_path(token.last))
      end
    end
  end

After that it would make sense to add :lang by default to the permalink templates:

  def template
    case site.permalink_style
    when :pretty
      "/:lang/:categories/:year/:month/:day/:title/"
    when :none
      "/:lang/:categories/:title.html"
    when :date
      "/:lang/:categories/:year/:month/:day/:title.html"
    when :ordinal
      "/:lang/:categories/:year/:y_day/:title.html"
    else
      site.permalink_style.to_s
    end
  end

Finally I would add the :lang method and and lang it to the url_placeholders hash in the Post class:

    def lang
      data['lang']
    end

    def url_placeholders
      {
        :year        => date.strftime("%Y"),
        :month       => date.strftime("%m"),
        :day         => date.strftime("%d"),
        :title       => slug,
        :i_day       => date.strftime("%-d"),
        :i_month     => date.strftime("%-m"),
        :categories  => (categories || []).map { |c| c.to_s }.join('/'),
        :short_month => date.strftime("%b"),
        :short_year  => date.strftime("%y"),
        :y_day       => date.strftime("%j"),
        :output_ext  => output_ext,
        :lang        => lang
      }
    end

This would mean, posts with lang: it in their front-matter would automatically have their permalinks prepended with the /es/ and posts would be added to the es directory in the generated site.

To me, this seems something that would be better if added to Jekyll. If there's any interest in that (/cc @parkr) I'd be happy to submit a pull request. If not I'll probably go ahead and release this as a standalone plugin.

Besides this I plan to add some data to the site payload so you will be able to do loops like {% for post in site.posts_by_language.es %}.

Anyway I'd love to hear some feedback on this approach. I've tested it and it works great. I just need to decide how I want to move forward.

parkr commented 9 years ago

The patch to #generate is a great idea – I'd gladly accept that. In terms of adding :lang to the permalink URL placeholders, I need to think about that more...

drallgood commented 9 years ago

Sounds like a good idea. I've implemented that feature as a plugin using a different approach given the limitations of jekyll/octopress but your approach looks way better (https://github.com/drallgood/jekyll-multilingual)

One thing you should keep in mind is what happens with posts that are only ever published in one language vs. ones that are posted in two (or more).

imathis commented 9 years ago

For now my plan is to use Octopress Hooks to add language specific post loops to the Site payload like this:

    class SiteHook < Hooks::Site
      def merge_payload(payload, site)
        {
          'site' => {
            'posts_by_language' => site.posts.select(&:lang).group_by(&:lang)
          }
        }
      end
    end

This will allow users to do the looping that I mentioned above {% for post in site.posts_by_language.es %} in any template. This should make it easy to integrate multi-language features in my other plugins and themes. I think with this approach it shouldn't matter if a post has a translated version or not. They're just treated as separate posts.

One thing I notice that I don't care for in most plugins is that they use categories to manage multi-language. This is certainly easier than what I'm doing, but I don't think languages should be treated like categories. It's mainly a hack to get different permalinks. I decided to solve this by attacking the permalinks instead.

@drallgood I'll check out your plugin and see what I can learn. Thanks!

drallgood commented 9 years ago

Yeah, I didn't like that approach either. That's why i opted to create my own plugin.

I think with this approach it shouldn't matter if a post has a translated version or not. They're just treated as separate posts.

It does matter if you still want to show a post even if it wasn't translated to a particular language (essentially duplicating it). That might be an edge-case, but for me it was important. I tend to write most posts in English and those should still show up even if people are viewing the site from Germany.

imathis commented 9 years ago

That's interesting, I'm sure there are going to be lots of use cases I'll need to consider.

imathis commented 9 years ago

I'd love to hear anyone's thoughts on this approach. I have little knowledge of how use cases work for these types of sites.

I think I've found a way to do cross-language posting without any friction or configuration.

        lang_posts = site.posts.select(&:lang).group_by(&:lang)
        languages = lang_posts.keys
        no_lang_posts = site.posts.reject(&:lang)

        # Ensure that posts without an assigned language
        # appear in each language's feed
        #
        lang_posts.each do | lang, posts |
          posts = posts.concat no_lang_posts
          posts.sort_by!(&:date)
        end

Then I add this to the site payload:

            'posts_by_language' => lang_posts,
            'languages' => languages

This will allow you to do {% for post in posts_by_language.de %} And you can also access the site's languages based on content rather than configuration.

In this case any post without a specific language set will appear in each post loop. There are things I don't like about this approach but it seems the least effort for the user, assuming this is what they want. I would be happier with it if I added a configuration cross_post_languages: true to each post in order to trigger that behavior, but I don't know if that friction is necessary. By design, each post will either have a language (saying only show this to people reading this language) or no language; saying show this to everyone.

imathis commented 9 years ago

Alright, I think I've figured out my initial approach to this. I've just released octopress-multilingual which has a detailed readme, explaining my initial direction for multilingual support. I would love feedback on this.

drallgood commented 9 years ago

Awesome. I'll continue the discussion there.

Thanks!

imathis commented 9 years ago

The Octopress Feeds plugin now supports multilingual feeds, and based on the work that took, adding support to Octopress Genesis (and other themes) seems like it'll be fairly straightforward. Woo hoo!

drallgood commented 9 years ago

This is really coming along nicely.

I've successfully migrated one of my blogs to octopress-multilingual without any major issues.

Huge props to @imathis for his time and effort!!!