Closed jeremyevans closed 2 years ago
Thanks! But this is not a real fix of this problem.
There is a problem in keep_start
/keep_drop
over @scanner
switch. I'll fix it later.
Anyway, we need to improve this case. But we can specify row separator explicitly:
diff --git a/lib/csv/parser.rb b/lib/csv/parser.rb
index 0d8a157..2d76316 100644
--- a/lib/csv/parser.rb
+++ b/lib/csv/parser.rb
@@ -85,9 +85,10 @@ class CSV
# If there is no more data (eos? = true), it returns "".
#
class InputsScanner
- def initialize(inputs, encoding, chunk_size: 8192)
+ def initialize(inputs, encoding, row_separator, chunk_size: 8192)
@inputs = inputs.dup
@encoding = encoding
+ @row_separator = row_separator
@chunk_size = chunk_size
@last_scanner = @inputs.empty?
@keeps = []
@@ -233,7 +234,7 @@ class CSV
@last_scanner = @inputs.empty?
true
else
- chunk = input.gets(nil, @chunk_size)
+ chunk = input.gets(@row_separator, @chunk_size)
if chunk
raise InvalidEncoding unless chunk.valid_encoding?
@scanner = StringScanner.new(chunk)
@@ -737,6 +738,7 @@ class CSV
chunk_size = ENV["CSV_PARSER_SCANNER_TEST_CHUNK_SIZE"] || "1"
InputsScanner.new(inputs,
@encoding,
+ @row_separator,
chunk_size: Integer(chunk_size, 10))
end
else
@@ -763,7 +765,7 @@ class CSV
StringIO.new(sample)
end
inputs << @input
- InputsScanner.new(inputs, @encoding)
+ InputsScanner.new(inputs, @encoding, @row_separator)
end
end
end
I pushed the row separator fix. It fixes a problem with the reported data. But it's not a real fix. We should fix it later: #230
In this case, read one more character.
This is a suboptimal fix, as it doesn't fix handling of row separators that aren't two characters and starting with \r. A better fix would handle all multibyte row separators. However, as \r\n is one of the most common row separators, I think it's useful to merge this until a more generic solution is developed.
Fixes [Bug #18245]