nytimes / Fech

Deprecated. Please see https://github.com/dwillis/Fech for a maintained fork.
http://nytimes.github.io/Fech/
Other
114 stars 30 forks source link

Added Support for Iso encoding to avoid error coming in parsing some file #47

Closed narutosanjiv closed 11 years ago

narutosanjiv commented 11 years ago

When we were parsing fec file( 780387.fec), we ArgumentError encounter error giving message "ArgumentError: invalid byte sequence in UTF-8". We found that Fec file also contain some unicode character so that i added iso-encoding inside gem (lib/fech/csv.rb) around line no 70.

dwillis commented 11 years ago

Thanks for the pull request; I haven't been able to replicate the error with that filing with or without CSVDoctor. Can you provide a little more detail about how you ran into this, such as platform, ruby version, etc? Thanks.

narutosanjiv commented 11 years ago

Hi @dwillis,

 Below of ruby version, rails and other configuration details that may be help you

1)MRI ruby 1.9.3-p194 2)Rails 3.2.8 3)Ubuntu 12.04

Step to Follow: 1) open irb:

2) Enter the below code: fech = Fech::Filing.new(771694) fech.download fech.rows_like(/sa/)

  It was giving me error(which i am pasting below):

       ArgumentError: invalid byte sequence in UTF-8
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1855:in `sub!'
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1855:in `block in shift'
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1849:in `loop'
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1849:in `shift'
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1791:in `each'
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1354:in `open'
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
from /home/sanjiv/.rvm/gems/ruby-1.9.3-p194@qwikcom/gems/fech-1.1/lib/fech/csv.rb:27:in `parse_row'
from /home/sanjiv/.rvm/gems/ruby-1.9.3-p194@qwikcom/gems/fech-1.1/lib/fech/filing.rb:285:in `each_row'
from /home/sanjiv/.rvm/gems/ruby-1.9.3-p194@qwikcom/gems/fech-1.1/lib/fech/filing.rb:69:in `rows_like'
from (irb):5
from /home/sanjiv/.rvm/rubies/ruby-1.9.3-p194/bin/irb:16:in `<main>'

3) I have found that current gem only parse utf-8 file and some file are coming other than utf-8 encoding. 4) Then i have open gem file "/home/sanjiv/.rvm/gems/ruby-1.9.3-p194@qwikcom/gems/fech-1.1/lib/fech/csv.rb" and edited function "parse_row" to make non-utf-8 compatible. I change code "line" statement to "line.force_encoding('ISO-8859-1')"

5) Without above change even Fech::CsvDoctor giving me error.

dwillis commented 11 years ago

Ok, thanks for this. I had not tested it under Rails 3.2, so perhaps that's where it happens. We'll check things out and move on this ASAP.

dwillis commented 11 years ago

We've fixed this issue; thanks for the report! https://github.com/NYTimes/Fech/pull/48