Open h4ck3rm1k3 opened 8 years ago
I can reproduce this issue: bundle exec opal -e "gsub(/[\x80-\xff]/n, '')"
MRI:
2.4.1 :001 > 'yo'.gsub(/[\x80-\xff]/n, '')
=> "yo"
I think the root cause is that n
flag is skipped because it's not widely supported by JavaScript vendors.
In this case we need n
flag to change encoding to ASCII-8BIT
: http://ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Encoding
No, the issue here is that \x80
is not a valid utf8 character. You can parse it by adding a # encoding: ascii-8bit
comment to the beginning of your file. I don't know why MRI parses it.
2.4.0 :001 > "\x80".encoding
=> #<Encoding:UTF-8>
2.4.0 :002 > "\x80".valid_encoding?
=> false
Thanks @iliabylich for your input.
The code is from ttfunk
: https://github.com/prawnpdf/ttfunk/blob/086b3126b13d207abf992279bef9b7699af8ae32/lib/ttfunk/table/name.rb#L20
I believe \x80-\xFF
are non-ASCII character ranges: http://www.unicode.org/charts/PDF/U0080.pdf (C1 Controls and Latin-1 Supplement)
@Mogztter Yes, you are right. To parse this file you need to add an encoding comment.
Related to #2235 - the 5aad139c7fcc92f4b5f7bd4412987843db535698 commit.
invalid multibyte escape: /[\x00-\x7F]|[\x80-\xBF][\xC0-\xF0]*|[\xC0-\xF0]/ (RegexpError)
I am not able to create an isolated test case yet, but I think it happens on this line : https://github.com/rubysl/rubysl-rexml/blob/0ab7aae8d824606dc41e855445b1b993c25e9285/lib/rexml/text.rb#L142