This PR increases the limit on the width of a field in the CSV files we read in the tap.
Here is a simple proof of concept script
$ cat foo.csv
id,value
1,abcdefghij
$ python3
Python 3.8.10 (default, Jun 1 2021, 15:06:54)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
# Set our limit to 10
>>> csv.field_size_limit(10)
131072
# Check that it gets set
>>> csv.field_size_limit()
10
# Read and print the file
>>> with open('foo.csv') as csv_file:
... csv_reader = csv.reader(csv_file, delimiter=',')
... for row in csv_reader:
... print(row)
...
['id', 'value']
['1', 'abcdefghij']
# Set the limit to 9, one character short of the biggest field in our file
>>> csv.field_size_limit(9)
10
# Check that the limit was set
>>> csv.field_size_limit()
9
# Try to read again and watch it fail with the same error as the tap
>>> with open('foo.csv') as csv_file:
... csv_reader = csv.reader(csv_file, delimiter=',')
... for row in csv_reader:
... print(row)
...
['id', 'value']
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
_csv.Error: field larger than field limit (9)
Manual QA steps
I ran the tap
Risks
This will likely obscure any future _csv.Error: field larger than field limit errors with an OOM error.
Description of change
This PR increases the limit on the width of a field in the CSV files we read in the tap.
Here is a simple proof of concept script
Manual QA steps
Risks
_csv.Error: field larger than field limit
errors with an OOM error.Rollback steps