logstash-plugins / logstash-filter-urldecode

Apache License 2.0
5 stars 9 forks source link

incompatible encodings: UTF-8 and ASCII-8BIT #6

Closed michalterbert closed 7 years ago

michalterbert commented 8 years ago

hello guys, i have some issue:

Exception in filterworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {"exception"=>#<Encoding::CompatibilityError: incompatible encodings: UTF-8 and ASCII-8BIT>, "backtrace"=>["org/jruby/RubyString.java:3064:ingsub'", "/opt/logstash/vendor/jruby/lib/ruby/1.9/uri/common.rb:331:in unescape'", "/opt/logstash/vendor/jruby/lib/ruby/1.9/uri/common.rb:649:inunescape'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-urldecode-2.0.2/lib/logstash/filters/urldecode.rb:50:in urldecode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-urldecode-2.0.2/lib/logstash/filters/urldecode.rb:39:infilter'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/filters/base.rb:151:in multi_filter'", "org/jruby/RubyArray.java:1613:ineach'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/filters/base.rb:148:in multi_filter'", "(eval):1223:incond_func_40'", "org/jruby/RubyArray.java:1613:in each'", "(eval):1220:incond_func_40'", "(eval):596:in filter_func'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:244:infilterworker'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:237:in filterworker'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:178:instart_filters'"], :level=>:error} Encoding::CompatibilityError: incompatible encodings: UTF-8 and ASCII-8BIT gsub at org/jruby/RubyString.java:3064 unescape at /opt/logstash/vendor/jruby/lib/ruby/1.9/uri/common.rb:331 unescape at /opt/logstash/vendor/jruby/lib/ruby/1.9/uri/common.rb:649 urldecode at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-urldecode-2.0.2/lib/logstash/filters/urldecode.rb:50 filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-urldecode-2.0.2/lib/logstash/filters/urldecode.rb:39 multi_filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/filters/base.rb:151 each at org/jruby/RubyArray.java:1613 multi_filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/filters/base.rb:148 cond_func_40 at (eval):1223 each at org/jruby/RubyArray.java:1613 cond_func_40 at (eval):1220 filter_func at (eval):596 filterworker at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:244 filterworker at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:237 start_filters at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:178 `

How can i fix this issue? Is there any way to show broken event (DEBUG for this filter)?

guyboertje commented 8 years ago

If your input has a codec the add

  charset => "ASCII-8BIT"

otherwise

codec => plain {
  charset => "ASCII-8BIT"
}

This will force encode to UTF-8

michalterbert commented 8 years ago

im using: codec => json { charset => "ASCII-8BIT" }

and have still this issue :/ im reading events from kafka and using this filter.

guyboertje commented 8 years ago

@michalterbert

guyboertje commented 8 years ago

for debugging - you can run LS with --debug

michalterbert commented 8 years ago

@guyboertje: one of fields from JSON.

Encoding::CompatibilityError: incompatible encodings: ASCII-8BIT and UTF-8 gsub at org/jruby/RubyString.java:3064 unescape at /opt/app/logstash-2.3.4/vendor/jruby/lib/ruby/1.9/uri/common.rb:331 unescape at /opt/app/logstash-2.3.4/vendor/jruby/lib/ruby/1.9/uri/common.rb:649 urldecode at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-filter-urldecode-2.0.4/lib/logstash/filters/urldecode.rb:50 filter at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-filter-urldecode-2.0.4/lib/logstash/filters/urldecode.rb:39 multi_filter at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/filters/base.rb:151 each at org/jruby/RubyArray.java:1613 multi_filter at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/filters/base.rb:148 initialize at (eval):3070 each at org/jruby/RubyArray.java:1613 initialize at (eval):3067 call at org/jruby/RubyProc.java:281 filter_func at (eval):967 filter_batch at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:267 each at org/jruby/RubyArray.java:1613 inject at org/jruby/RubyEnumerable.java:852 filter_batch at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:265 worker_loop at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:223 start_workers at /opt/app/logstash-2.3.4/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:201

In input Kafka I have decorate_events => false

Error log from logstash {:timestamp=>"2016-08-19T11:38:31.758000+0200", :message=>"Received an event that has a different character encoding than you configured.", :text=>"csrftoken=BI2dsfdsfsaas3K3Gh0aY+05522131Ou5eDGMxLWT0KNmk=&backcheck=true&screen=opdradsagdfsorgfd&result=VERZENDEN&adresbegunstigde=dsarwwwqqc+15+&postcodebegunstigde=&landbegunstigde=CH&bankcodebegunstigde=&naambankbegunstigde=UBS+Switzerland+AG&adresbankbegunstigde=8098&plaatsbankbegunstigde=Z\\xFCrich&mededeling1=Zurueckzahlung+wegen+&mededeling2=Doppelzahlung++Papyrasse&faxadviesnummer=&faxadviesaanhef=", :expected_charset=>"UTF-8", :level=>:warn}

michalterbert commented 8 years ago

any update?

guyboertje commented 8 years ago

Not really. You are not giving me much info besides the error message and backtrace. What is your config? What does the message look like in Kafka?

michalterbert commented 8 years ago

Hello, In Kafka we have pure json events. My logstash config:

`

input {
kafka {
   zk_connect => "xxxxx:2181"
   topic_id => "nginx"
   group_id => "nginx"
   codec => json { charset => "ASCII-8BIT" }
   consumer_threads => 1
   queue_size => 20
   rebalance_backoff_ms => 4000
   }
}

filter{ 
    if [requestContentType] =~ /(?i)urlencoded/ {
       urldecode { field => "requestContent" charset => "UTF-8" }
     }
 }

`

`

output {
      elasticsearch {
            index => "topic-%{+YYYY.MM.dd}"
            hosts => [ "host1:9200" ]
            workers => 4
            flush_size => 5000
            idle_flush_time => 10
    }
 }

`

guyboertje commented 7 years ago

Based on another issue recently, this is caused when an unescaped character is seen in the url/uri. For example: %2Fsome_endpoint%3Fquery%3Dblah+blah+blåh should be %2Fsome_endpoint%3Fquery%3Dblah+blah+bl%C3%A5h Use a regex conditional to detect this e.g.

    if [uri] !~ /[^\x00-\x7F]/ {
       urldecode { field => "uri" charset => "UTF-8" }
     }

or a mutate gsub to replace the non ascii 7 bit chars with a placeholder.