ruby-gettext / gettext

Gettext gem is a pure Ruby Localization(L10n) library and tool which is modeled after the GNU gettext package.
https://ruby-gettext.github.io/
68 stars 28 forks source link

Use Erubi instead of ERB to parse `.erb` templates #91

Closed MichaelHoste closed 2 years ago

MichaelHoste commented 2 years ago

Erubi is the default Rails ERB parser since 5.1+, and before that, it was erubis.

This project still uses ERB and that could create some issues with case statements in .erb files, where Rails works fine but GetText fails to parse.

I switched the parser quite easily and the test suite still passes.

Do you think this PR could be merged without risk? Would there be any potential problems with this change?

kou commented 2 years ago

First, could you report the problem to https://github.com/ruby/erb ? Then I'll review this.

MichaelHoste commented 2 years ago

Thanks @kou, I did it here: https://github.com/ruby/erb/issues/4

But, even if it was fixed on ERB, wouldn't it be working only on future Ruby versions?

MichaelHoste commented 2 years ago

Erubi doesn't support the coding magic comment and 2 tests were failing: test_different_encoding_from_current_locale and test_multiple_encodings.

I had to convert the encoding before calling Erubi using the magic code on top of the example files (as seen here: https://github.com/ruby-gettext/gettext/blob/master/test/tools/test_xgettext.rb#L124)

I improved and simplified the code and was able to use a slightly modified RegExp to detect the encoding. I hope the reasoning was right.

Since UTF8 is the default encoding for Ruby2.0+ (if I remember right), maybe that encoding part is no more relevant and these tests could even be removed?

(This comment was lost in a previous diff so I put it back here.)

MichaelHoste commented 2 years ago

Do you feel that erubi could bring some unexpected issues?

If that's the case, changing from ERB to eruby is maybe not critical.

MichaelHoste commented 2 years ago

Do you feel that erubi could bring some unexpected issues?

I don't know. Could you consider this carefully again, just in case?

I'm biased because I only need to parse ERB templates from Rails apps, and for this use case it makes a lot of sense to use the same parser. But I know this gem is also used for pure-Ruby applications and I'm less knowledgable about practices there.

I felt that using Erubi was a good fit in the code because it worked without using trim_mode or without making a special case for @@erb_accept_keyword_arguments. It felt good that the tests were still passing (but I'm not sure about the coverage of ERB).

I'm French-speaking and to me UTF8 is well-established and other encoding like ISO-8859-1 are a thing of the past. I know that backward compatibility is important here and I'm not sure if the new encoding management covers all the case of the old one.


All that being said, I'm ok to leave this PR open until other users manifest themselves with the same issue.

It's not something really critical, and it's easy to adapt the ERB file for it to parse correctly.

kou commented 2 years ago

How about using ERB as the first candidate and Erubis as a fallback when syntax error is raised with ERB? It will keep backward compatibility.

MichaelHoste commented 2 years ago

That could be a good compromise 🙂

I may be able to adapt this PR at the end of the week.

MichaelHoste commented 2 years ago

How about using ERB as the first candidate and Erubis as a fallback when syntax error is raised with ERB?

Unfortunately, erb parser doesn't raise any syntax error, it simply ignores what's after the problematic "case" keyword. So that strategy couldn't work 😕

To complete this PR, I added a spec with an ERB file that works with erubi parser and doesn't work with erb parser: https://github.com/ruby-gettext/gettext/pull/91/commits/29b73d1df50ab52301e6bebe18143bef53acf8ac

MichaelHoste commented 2 years ago

I'm sorry to come back with this @kou, but I ran into other people who encountered this ERB case/when parsing issue.

Would it be a problem for you if I did this:

if defined?(Erubi)
  // Erubi code
else
  // ERB code
end

It should cover Rails users that have this specific issue. And we don't even need to require Erubi in this gem.

Other solution would be to add a GetText option to switch the default ERB parser. Like GetText.erb_parser = 'Erubi' (with 'ERB' as default).

Would you accept to merge it one of these solutions?

kou commented 2 years ago

How about adding lib/gettext/tools/parser/erubi.rb instead of replacing lib/gettext/tools/parser/erb.rb and calling GetText::Tools::Task.define {|task| task.xgettext_options += ["--require=gettext/tools/parser/erubi", "--parser=GetText::ErubiParser"]} with the following change?

diff --git a/lib/gettext/tools/xgettext.rb b/lib/gettext/tools/xgettext.rb
index 73db68e..94cba94 100644
--- a/lib/gettext/tools/xgettext.rb
+++ b/lib/gettext/tools/xgettext.rb
@@ -330,6 +330,11 @@ Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;
           require out
         end

+        parser.on("--parser=PARSER",
+                  _("Add PARSER to parser list for xgettext")) do |parser_class|
+          add_parser(parser_class.new)
+        end
+
         parser.on("-c", "--add-comments[=TAG]",
                   _("If TAG is specified, place comment blocks starting with TAG and precedding keyword lines in output file"),
                   _("If TAG is not specified, place all comment blocks preceing keyword lines in output file"),
MichaelHoste commented 2 years ago

How about adding lib/gettext/tools/parser/erubi.rb instead of replacing lib/gettext/tools/parser/erb.rb and calling GetText::Tools::Task.define {|task| task.xgettext_options += ["--require=gettext/tools/parser/erubi", "--parser=GetText::ErubiParser"]} with the following change?

Very nice suggestion 🙂

I made the changes and added some tests, I hope it will be ok to merge now.

EDIT : Tests don't pass on Ruby >= 3.0, I'll investigate

MichaelHoste commented 2 years ago

It seems that the parser was already working correctly with ERB files containing case for Ruby versions >= 3.0.0.

ERB library itself is however broken with case in these versions (eval doesn't work on the generated source, like explained here: https://github.com/ruby/erb/issues/4#issue-1073194536).

My guess is that Ripper is maybe more resilient in more modern Ruby version? I don't know, but I added some conditional testing for ERB depending on the Ruby version.

kou commented 2 years ago

Merged. Thanks.