lisad / phaser

The missing layer for complex data batch integration pipelines
MIT License
9 stars 1 forks source link

Boston pipeline failing - field value of string containing comma gets parsed as 2 fields #160

Closed lisad closed 1 month ago

lisad commented 2 months ago

I haven't been able to figure out what's going on here yet. In the boston pipeline, we save out "select-bike-counts_output.csv" as the output of the first phase. That file looks fine, but loading it back in for the next step of the pipeline errors because line 3442 , with the description field "Southern New England Trunkline Trail, Grove Street Trailhead", has 69 columns instead of 68. That makes the recordizing fail because the Records expects to find the phaser row num in column 68, only everything has been shifted over one.

I can tell that the clevercsv.DictReader class is returning an OrderedDict with 69 columns for this row, so this may just be happening due to an error below where we deal with CSVs but one way or another we should figure this out

To repro: run the boston pipeline with data including that field, it fails in the 2nd phase unable to find the row num

lisad commented 1 month ago

Fixed by not using clevercsv