My first obstacle getting the boston pipeline working again is the two lines at the end of the file that only have whitespace. This is causing a column error because the rows don't have a value for COUNT_TYPE from the list of allowed values.
While it might sometimes work to make this column drop rows with a count type of None, that might not be appropriate - it might be that if a row that has OTHER values has a COUNT_TYPE of None or another not allowed value, that should raise an error and stop the pipeline. So it would be more careful to drop rows that are entirely blank, then keep the careful validation of allowed values for this one column. .
While doing this, we should consider whether a row that is all empty values is also dropped (i.e. it has a row of just commas), or if that's a different case. I think the library could default to dropping both of these cases.
Dropping fully empty rows does work now, but a row of just commas does not yet work. There's a test for it (test_csv.py::test_empty_line_only_commas) but it's skipped for now
My first obstacle getting the boston pipeline working again is the two lines at the end of the file that only have whitespace. This is causing a column error because the rows don't have a value for COUNT_TYPE from the list of allowed values.
While it might sometimes work to make this column drop rows with a count type of None, that might not be appropriate - it might be that if a row that has OTHER values has a COUNT_TYPE of None or another not allowed value, that should raise an error and stop the pipeline. So it would be more careful to drop rows that are entirely blank, then keep the careful validation of allowed values for this one column. .
While doing this, we should consider whether a row that is all empty values is also dropped (i.e. it has a row of just commas), or if that's a different case. I think the library could default to dropping both of these cases.