ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
178 stars 113 forks source link

Illegal quoting in line 1. (CSV::MalformedCSVError) but loads OK in LibreOffice/GoogleSheets #242

Closed bradkrane closed 2 years ago

bradkrane commented 2 years ago

I'm trying to track down an issue with a CSV file where if I load the raw file as I received it from a PayPal download I get a illegal quoting online one error however if I truncate the large file and only load the first say 50 lines I do not get this malformed CSV error illegal quoting online one error and the contents load as expected.

I've also been able to take that file loaded in Libra office then save it as a CSV it strips out many unnecessary quotes around fields and the CSV loads as expected without error. Another difference between the two different files other than removed quotes is original CSV is UTF-8-BOM while copy is UTF-8 (so says NotePad++). I've tried encoding:'UTF-8-BOM' but get same error

I would like to figure out what the issue is but need some help tracking down the error, is it a problem with the file or the lib? How can I track down exactly which quote what character or whatnot in the original CSV file is causing the error so I can track down what's wrong?

Thanks for the help!

C:\Users\Brad Krane\Documents\src\csv-quote>ruby -v
ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [x64-mingw-ucrt]
C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:955:in `parse_quotable_robust': Illegal quoting in line 1. (CSV::MalformedCSVError)
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:894:in `block in parse_quotable_loose'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:129:in `block in each_line'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:105:in `each_line'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:105:in `each_line'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:855:in `parse_quotable_loose'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:338:in `parse'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2365:in `each'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2365:in `each'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2400:in `to_a'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2400:in `read'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:1578:in `parse'
        from (irb):6:in `block in <top (required)>'
        from (irb):6:in `open'
        from (irb):6:in `<main>'
        from C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
kou commented 2 years ago

Could you share a CSV file that reproduces this problem?

bradkrane commented 2 years ago

If I can do so privately somehow and trust you to destroy the information. Unfortunately the CSV file is full of personally identifiable information name, address, email, phone number, and some other info.

bradkrane commented 2 years ago

After some more testing it could be the UTF-8-BOM doing head file.csv > trunc.csv (as opposed to save as with NP++) the error is preserved I'll see if I can replace any personal information and get you a copy that has the error

bradkrane commented 2 years ago

Actually I was able to get failure with the first line alone using head -1 > trunc3.csv from the OG file please see the attached.

trunc3.csv

kou commented 2 years ago

Thanks. Could you also provide a Ruby script that reproduces this case?

bradkrane commented 2 years ago
Microsoft Windows [Version 10.0.19042.1586]
(c) Microsoft Corporation. All rights reserved.

C:\Users\Brad Krane\Documents\src\csv-quote>ruby -v
ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [x64-mingw-ucrt]

C:\Users\Brad Krane\Documents\src\csv-quote>irb -v
irb 1.4.1 (2021-12-25)

C:\Users\Brad Krane\Documents\src\csv-quote>irb
irb(main):001:0> require 'csv'
=> true
irb(main):002:0>
irb(main):003:0> File.open('trunc3.CSV')   { |f| CSV.parse(f,  ) }
C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:955:in `parse_quotable_robust': Illegal quoting in line 1. (CSV::MalformedCSVError)
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:894:in `block in parse_quotable_loose'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:53:in `block in each_line'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:50:in `each_line'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:50:in `each_line'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:855:in `parse_quotable_loose'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv/parser.rb:338:in `parse'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2365:in `each'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2365:in `each'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2400:in `to_a'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:2400:in `read'
        from C:/Ruby31-x64/lib/ruby/3.1.0/csv.rb:1578:in `parse'
        from (irb):3:in `block in <top (required)>'
        from (irb):3:in `open'
        from (irb):3:in `<main>'
        from C:/Ruby31-x64/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
        from C:/Ruby31-x64/bin/irb:33:in `load'
        ... 1 levels...
irb(main):004:0>
kou commented 2 years ago

Thanks.

Could you try File.open('trunc3.CSV', encoding: "BOM|UTF-8") {|f| CSV.parse(f)}?

bradkrane commented 2 years ago

Hi,

Thanks, it works as expected! I thought that I was probably making a mistake in the encoding. I tried encoding: 'UTF-8-BOM' and looked around to no avail. Thanks for the correct string vey much appreciated!

Cheers,

On Sun, Apr 17, 2022 at 8:13 PM Sutou Kouhei @.***> wrote:

Thanks.

Could you try File.open('trunc3.CSV', encoding: "BOM|UTF-8") {|f| CSV.parse(f)}?

— Reply to this email directly, view it on GitHub https://github.com/ruby/csv/issues/242#issuecomment-1100975185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKVXOPSM7KATLJEQJKRWLVFSSMBANCNFSM5SN3MRYA . You are receiving this because you authored the thread.Message ID: @.***>

-- Brad