singer-io / tap-s3-csv

GNU Affero General Public License v3.0
27 stars 52 forks source link

Bump csv field width #47

Closed luandy64 closed 2 years ago

luandy64 commented 2 years ago

Description of change

This PR increases the limit on the width of a field in the CSV files we read in the tap.

Here is a simple proof of concept script

$ cat foo.csv
id,value
1,abcdefghij
$ python3
Python 3.8.10 (default, Jun  1 2021, 15:06:54)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv

# Set our limit to 10
>>> csv.field_size_limit(10)
131072

# Check that it gets set
>>> csv.field_size_limit()
10

# Read and print the file
>>> with open('foo.csv') as csv_file:
...   csv_reader = csv.reader(csv_file, delimiter=',')
...   for row in csv_reader:
...     print(row)
...
['id', 'value']
['1', 'abcdefghij']

# Set the limit to 9, one character short of the biggest field in our file
>>> csv.field_size_limit(9)
10

# Check that the limit was set
>>> csv.field_size_limit()
9

# Try to read again and watch it fail with the same error as the tap
>>> with open('foo.csv') as csv_file:
...   csv_reader = csv.reader(csv_file, delimiter=',')
...   for row in csv_reader:
...     print(row)
...
['id', 'value']
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
_csv.Error: field larger than field limit (9)

Manual QA steps

Risks

Rollback steps