Closed kaligrafy closed 1 year ago
can you please upload an example CSV-file somewhere, and add the link to it here?
GTFS files from Montreal STM: http://www.stm.info/sites/default/files/gtfs/gtfs_stm.zip
thanks! How did you generate these CSV files? were they produced by exporting them from somewhere?
Please Note that you can use any open file-handle instead of the filename!!
This works just nicely for me (e.g. it automatically strips the unicode character):
> irb
require 'smarter_csv'
f = File.open('trips.txt', "r:bom|utf-8")
data = SmarterCSV.process( f )
f.close
data.size
=> 152414
data.first
=> {:route_id=>1, :service_id=>"13N_S", :trip_id=>"13N_13N_S_1_1_0.22917", :trip_headsign=>"Station Honoré-Beaugrand"}
I think this is really a corner-case, probably caused by the program from which you created those CSV files. I think it's not really needed to fix this in the smarter_csv gem itself.
These files are public and generated by the agency itself. I did not create them.
In your example, can you check that you can call data.first[:route_id] on it? because on my system, the first key (:route_id) does include the zero-width character, so data.first[:route_id] returns nil.
Pierre-Léo Bourbonnais, Ing. Jr. Étudiant au doctorat et chargé de cours École Polytechnique de Montréal | Génie civil Étude Mobilité 514-340-4711 #4235 Local B-327 leo.bourbonnais@polymtl.ca
On 2014-02-06, at 19:37, Tilo notifications@github.com wrote:
Closed #27.
— Reply to this email directly or view it on GitHub.
@tilo i realize this is four years old, and your workaround works for me, but this does not seem to be a corner case. i'm seeing it now with any CSV generated by Excel on Windows.
OK, thanks! Feel free to add the workaround if you think it could be useful.
as @tilo described / using his example:
without the workaround:
data = SmarterCSV.process('trips.txt')
data.first
=> {:route_id=>1, :service_id=>"13N_S", :trip_id=>"13N_13N_S_1_1_0.22917", :trip_headsign=>"Station Honoré-Beaugrand"}
data.first[:route_id]
=> nil
data.first.keys.first.to_s.chars
=> ["", "r", "o", ...]
with the workaround:
data = nil
File.open('trips.txt', 'r:bom|utf-8') { |f| data = SmarterCSV.process(f) }
data.first
=> {:route_id=>1, :service_id=>"13N_S", :trip_id=>"13N_13N_S_1_1_0.22917", :trip_headsign=>"Station Honoré-Beaugrand"}
data.first[:route_id]
=> 1
data.first.keys.first.to_s.chars
=> ["r", "o", ...]
Shouldnt a smarter CSV library handle this automatically? I can't open any CSV thats been exported by excel because these are all the column headers:
On top of that, look at what it parses for the partner_id:
This is definitely not a corner case, considering BOM are the bane of working with CSV's, and excel is the most popular CSV program. This being handled within the gem seems like a no brainer
looks like the previous fix for the BOM issue did not work, or there was a regression
$ hexdump -C /tmp/bom-issue.csv
00000000 ef bb bf 73 6f 6d 65 5f 69 64 2c 74 79 70 65 2c |...some_id,type,|
00000010 66 75 7a 7a 62 6f 78 65 73 0d 0a 34 32 37 36 36 |fuzzboxes..42766|
00000020 38 30 35 2c 7a 69 7a 7a 6c 65 73 2c 31 32 33 34 |805,zizzles,1234|
00000030 0d 0a 33 38 37 35 39 31 35 30 2c 71 75 69 7a 7a |..38759150,quizz|
00000040 65 73 2c 35 36 37 38 0d 0a |es,5678..|
I was having weird problems when reading keys from a smarter_csv parsed csv file: the first key was never recognize. After a long time I tought maybe the file had a special character in it and then: boom! i found the csv file was in UTF-8 with BOM, and there was a Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF) (#65279) as the first character, so the first key always had this character at index [0] and nobody could see it...
Could you ignore BOM and delete this character if first line starts with it when reading the csv file? Thanks!
You could do this: