ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
178 stars 113 forks source link

CSV parse does not honor field_size_limit option unless and until a comma occurs in the data, and field_size_limit is off by one #238

Closed Capncavedan closed 2 years ago

Capncavedan commented 2 years ago

When using CSV.parse or CSV.foreach and specifying option field_size_limit: 2_000, we do not consistently see an exception raised when a field contains over 2,000 characters.

I was finally able to reproduce the issue as occurring only after a comma has occurred within a data field.

I then also found what could be considered an off-by-one error with respect to "field_size_limit": you need to set a value 1 higher than the maximum field length you want to allow.

This occurs on Ruby 2.7.5 and 3.1.1.

This is a simple ruby script to demonstrate both issues:

require "csv"

the_alphabet = ("a".."z").to_a.join
the_alphabet.size # => 26

# this does not honor field_size_limit; it should raise an exception but does not
CSV.parse("\"I am a working man\",\"#{the_alphabet}\"", field_size_limit: 20)

# this raises the proper exception
CSV.parse("\"I am a workin, man\",\"#{the_alphabet}\"", field_size_limit: 20)

# this raises a "Field size exceeded" error, even though field size equals field size limit
CSV.parse("\"I am a workin, man\",\"#{the_alphabet}\"", field_size_limit: the_alphabet.size)

# this works as expected, no exception raised
CSV.parse("\"I am a workin, man\",\"#{the_alphabet}\"", field_size_limit: the_alphabet.size+1)
Capncavedan commented 2 years ago

From https://bugs.ruby-lang.org/issues/18638

kou commented 2 years ago

I've fixed the former case as a bug but don't change field_size_limit: for the latter. Because the latter breaks a backward compatibility.

I've deprecated field_size_limit: and introduced max_field_size: instead. field_size_limit: 10 equals to max_field_size: 9.

Capncavedan commented 2 years ago

Thank you very much, @kou!