logstash-plugins / logstash-filter-mutate

Apache License 2.0
16 stars 75 forks source link

RegexpError: invalid multibyte escape #88

Closed glinuz closed 7 years ago

glinuz commented 7 years ago

Pipeline aborted due to error {:exception=>#<RegexpError: invalid multibyte escape: /\xD2\xBB\xD4\xC2/>, :backtrace=>["org/jruby/RubyRegexp.java:1424:in initialize'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-mutate-3.1.3/lib/logstash/filters/mutate.rb:196:inregister'", "org/jruby/RubyArray.java:1653:in each_slice'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-mutate-3.1.3/lib/logstash/filters/mutate.rb:184:inregister'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:230:in start_workers'", "org/jruby/RubyArray.java:1613:ineach'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:230:in start_workers'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:183:inrun'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:292:in `start_pipeline'"]}

guyboertje commented 7 years ago

@glinuz Please give an example of your source data. or try this \u00D2\u00BB\u00D4\u00C2 instead.

Use this site to verify http://rubular.com/

glinuz commented 7 years ago

My OS env LANG=zh_CN.GBK 。 So i want to use Hex ,match MONTH in Chinese word. January 一月 (Chinese) GBK encode is "\xD2\xBB\xD4\xC2"

But now ,I had convert charset from GBK to UTF-8 in Filebeat config file. And unicode regexp is OK。

My source data: 192.168.1 - - [11/一月/2017:09:25:42 +0800] "POST /

At 2017-01-16 14:51:04, "Guy Boertje" notifications@github.com wrote:

@glinuz Please give an example of your source data. or try this \u00D2\u00BB\u00D4\u00C2 instead.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

guyboertje commented 7 years ago

@glinuz Great, I'm glad to hear this. FYI, all strings in Logstash should be converted to unicode in an input.

If all is OK, please close this.