rtomayko / tilt

Generic interface to multiple Ruby template engines
http://github.com/rtomayko/tilt
Other
1.95k stars 225 forks source link

How should we handle encodings? #75

Closed judofyr closed 11 years ago

judofyr commented 13 years ago

Templates are very much based on files, which therefore means we'll have to worry about encodings. There are two ways templates can be passed to an engine: through a block (which returns a string: klass.new { 'foo' }) or through a filename.

Currently, encodings are handled as such:

For all cases, you can pass :default_encoding => "FOO" as an option which will set the @default_encoding-ivar. In addition, if the precompiled code (when using precompiled mode) includes a magic comment (# coding: FOO) OR a @default_encoding is set, the generated code will include a magic comment before the preamble.

My thoughts

Some observation:

Let's go over each of the use cases and how I believe we should change Tilt:

Fetched through block

Any template that's fetched through a block already has an encoding. I think we should keep the current behaviour such that the template keeps its encoding and @default_encoding is set to options[:default_encoding] (which defaults to nil).

Fetched through file system

We don't know the encoding, so we'll read the file in binary mode and set @default_encoding to options[:default_encoding] || Encoding.default_external (this last one is different from today).

The precompiled code from the template engine

Because tilt.rb is written in UTF-8 and the precompiled code will be manipulated with literal strings from tilt.rb, the precompiled code generated by the template engine must be compatible with UTF-8 (not a code change, just a spec).

The encoding of the final generated code

The encoding of the final generated code should be decided by the template engine. There are two ways: the precompiled code is encoded in a specific encoding or the precompiled code includes a magic comment. Tilt should not add a magic comment based on @default_encoding because that's just a hint from Tilt's side and it's the engine responsibility to handle it correctly. (code change).

Why @default_encoding is only a hint

Even when you pass :default_encoding as an option (and it's not inherited from Encoding.default_external when reading from the file system) we'll have to depend on that the template engine handles it. The engine might have a different way of checking the encoding of the template, so we can't just @data.encode(@default_encoding) in case it leads to an encoding error. The default encoding is after all only that: the default encoding, which the engine should fall back to if it can't guess it.

How template engine should handle the encoding

Many simple template engine don't really need to work on the encoding-level, so from theirs point of view, this should be enough:

def compile(template, default_encoding)
  data = template.dup.force_encoding("BINARY")
  enc = encoding_specified_in_template(data) # extract magic comment or whatnot
  compiled = parse_and_compile(data)
  compiled.force_encoding(enc || default_encoding || template.encoding)
end

If you really need to be fully encoding-aware before you parse and compile, you should do something like this:

def compile(template, default_encoding)
  data = template.dup.force_encoding("BINARY")
  enc = encoding_specified_in_template(data) || default_encoding || template.encoding
  data.force_encoding(enc)
  compiled = parse_and_compile(data)
  compiled.force_encoding(enc) # You don't need this if you rather generate a magic comment
end

Your thoughs

What do you think?

rue commented 12 years ago

Is this seriously still unsolved? What do you need to get this done, I’ll help?

rtomayko commented 12 years ago

@rue Yes. I'd say the work is maybe half done and it's going to require a major rev to release due to the change in behavior. https://github.com/rtomayko/tilt/pull/107 is IMO still the furthest we've got on this and its not being actively worked on. I assume it's suffered from bit rot over the last year as well.

The docs added at the top of https://github.com/rtomayko/tilt/pull/107/files give a basic roadmap of what work remains. The two biggest issues are implementing the :transcode option and auditing every single template engine. If you'd like to help with either, fork and send a pull request with the encodings branch as the base.

minad commented 11 years ago

@judofyr Any ideas how this will evolve?

svasva commented 11 years ago

I've just ran into this issue and I will just leave a workaround here that worked for me. At the top of an .erb file: <% # encoding: utf-8 %>

judofyr commented 11 years ago

Initial implementation merged in #175. Open a new issue if you have any problems.