ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
178 stars 114 forks source link

unknown encoding name - UTF-16:UTF-8 (ArgumentError) #241

Closed rvalladares77 closed 1 year ago

rvalladares77 commented 2 years ago

I just upgraded ruby to 2.7.5 and am getting this error in one of my tests:

unknown encoding name - UTF-16:UTF-8 (ArgumentError)

The code that I am using looks like this:

CSV.parse(page.body, headers: true, encoding: 'UTF-16:UTF-8').size()

Settings:

kou commented 2 years ago

Could you show the version of old Ruby that works?

rvalladares77 commented 2 years ago

@kou

Sure, this is the old ruby version 2.6.6p146 where it works fine.

kou commented 2 years ago

Thanks. What is the encoding of page.body? UTF-16BE or UTF-16LE? Could you show a sample String that works with Ruby 2.6?

rvalladares77 commented 2 years ago

it looks like this:

"ID,Title,Description,Mod Date,Jurisdiction,Leg. Ref.,Sectors,Legislation Types,Legislation Status,English,Alternate Language,URL Link,Pub Date,Effective Date\n444,\"Legislation4 in EN\nLegislation4 in FR\",\"protect the facility, please\nprotect the facility, please\",,CA,\"LegEN4\nLegFR4\",Mining,General,Published,no,no,\"https://www.tes.org/en/ca/laws/stat/sc-19-c-33/latest/sc-19-c-33.html\nhttps://www.test.org/en/ca/laws/stat/sc-19-c-33/latest/sc-19-c-33.html\",2017-08-04,2017-08-08\n"

kou commented 2 years ago

Thanks but could you attach the content instead of pasting here? If we paste here, encoding information is lost.

rapito commented 1 year ago

I also have this issue, and it seems to be caused by these changes on the initialize method for CSV: 3.0.0/csv image

Previously it used to be like this: 2.6.0/csv image

Exactly, what is the benefit of doing @io.set_encoding there, isn't that the purpose of tracking @encoding separately and doing @encoding = determine_encoding(encoding, internal_encoding) anyway after that?

This is the example code that no longer works after upgrading from 2.6.8 to 2.7.0 and then to 3.0.5:

CSV.parse(some_csv_string, headers: true, encoding: 'ISO-8859-1:UTF-8') do |row|
# magic
end
kou commented 1 year ago

Could you retry with the latest version? https://rubygems.org/gems/csv

rapito commented 1 year ago

Isn't this included with the Ruby version itself though? Can I just override it on a gemfile?

-- EDIT -- Alright, yeah that worked, thanks!