jordansissel / eventmachine-tail

Ruby EventMachine file tailing and friends. 'gem install eventmachine-tail' to install.
127 stars 16 forks source link

How should we deal with invalid characters on 1.9? #13

Open eric opened 13 years ago

eric commented 13 years ago

We ran into issues on 1.9 with a file that is supposed to be UTF-8 having invalid characters in it.

A fix was suggested for remote_syslog that should clearly go directly into eventmachine-tail, but I haven't been able to figure out exactly how I would want to fix it.

Here is the discussion that we've had so far: https://github.com/papertrail/remote_syslog/pull/13

Any thoughts would be welcome for the best way to solve this.

jordansissel commented 13 years ago

Since em-tail doesn't display or do any calculations on characters, really, I don't think it should care what encoding the data has that it is reading - so if it's breaking on some input, I think it's a bug in em-tail.

Can you publish a sample file with some bad data? Otherwise I'll try to reproduce and hack on a fix.

eric commented 13 years ago

I was just playing around with this: https://gist.github.com/1169737

mblair commented 13 years ago

I've hit this too. I've tried the iconv workaround in the remote_syslog pull request, as well as something like:

data = data.encode!( 'UTF-8', invalid: :replace, undef: :replace )

And I'm still getting the following error:

/usr/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/em/buftok.rb:66:in `split': invalid byte sequence in UTF-8 (ArgumentError)

Any ideas?

vihai commented 12 years ago

Any news about this issue?

jordansissel commented 12 years ago

Probably should just read into a buffer that is set explicitly to binary mode, and let the consumer of the data care about the encoding.

I'll get to fixing this eventually if nobody else does.

rb2k commented 10 years ago

Resurrecting this after a few years :) We just ran into this as well.

For us, we launched an app with start-stop-daemon and didn't pass the LC_ALL variable set to something UTF-8'ish

--> Ruby uses POSIX/ASCII and blows up when having to touch and UTF-8 char

jordansissel commented 10 years ago

This project has been replaced by the filewatch library. Last I knew, event machine was abandoned as a project (most recent release is 1.5 years ago), so I recommend not using em-tail.

sorry for the bugs, but this project is probably not worth resurrecting.

Recommend you check out the filewatch library instead, maybe?

On Thursday, September 4, 2014, Marc Seeger notifications@github.com wrote:

Resurrecting this after a few years :) We just ran into this as well

— Reply to this email directly or view it on GitHub https://github.com/jordansissel/eventmachine-tail/issues/13#issuecomment-54489680 .

rb2k commented 10 years ago

Sure, probably a good choice :)

Although I don't see an integrated way of actually tailing a file, rather than just being notified that something changed? But maybe it's just too early in the morning ;)