logstash-plugins / logstash-input-http

Apache License 2.0
51 stars 66 forks source link

Error in handling charsets different from UTF-8 #132

Open andsel opened 4 years ago

andsel commented 4 years ago

output { stdout { codec => json {charset => "UTF-8"} } }

- Sample Data:
python script to use as client to send encoded data
```python
import requests
API_ENDPOINT = "http://127.0.0.1:9006"
message='TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç'
r = requests.post(url = API_ENDPOINT, data = bytes(message,'cp1254'))

This seems not to be a problem in the codec because I've tried with this pipeline (same codec, different input):

input {
    file {
        path => "/tmp/cp1254_encoded.txt"
        mode => "read"
        sincedb_path => "/dev/null"
        file_completed_log_path => "/tmp/file_actions.log"
        file_completed_action => "log"
        codec => plain {
            charset => "CP1254"
        }
    }
}   

output {
    stdout {
        codec => json {charset => "UTF-8"}
    }
}

with the file attached as input data cp1254_encoded.txt

and the console out is what's expected (TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç)

NB: to reproduce the text file simply cut&paste the above string in a text editor and ask the editor to save it with encoding CP1254

GokcerBelgusen commented 3 years ago

Hi guys, any improvement about this issue ?

andsel commented 3 years ago

Hi @GokcerBelgusen actually no news on this, but I'll keep track in my radar