Closed steve-estes closed 1 year ago
This is easy to do if using agate
directly. Is that an option for you?
Otherwise, we would need a command-line argument to specify null values.
Indeed, "\N" is used in the movie database in the IMDB datasets: https://datasets.imdbws.com/
My current solution is to use sed as a pre-processor:
sed -i "s/\\\N//g" $filename
But as you mention, that doesn't allow for distinguishing between nulls and blanks. And it's an extra step, so certainly having csvkit recognize it would be better.
Thanks for such a fast turnaround - and so flexible a solution.
The handling / detection of nulls within csvkit (not sure if it's properly within Agate or not) includes a list of values that are interpreted as nulls. You can see them in the parameters list for --blank, which turns off the default behavior that
I would like this list to include "\N", which is something commonly done in old DB systems, and sometimes written out to CSV when the producers of the CSV wish to distinguish between empty strings and actual nulls. This could be as simple as a one-liner in the code, I don't know, but either way it'd be handy.