max-mapper / csv-spectrum

A variety of CSV files to serve as an acid test for CSV parsing libraries
141 stars 31 forks source link

deal with missing values #16

Open calvinmetcalf opened 9 years ago

calvinmetcalf commented 9 years ago

both of the ,, and NULL variety inspired by mafintosh/csv-parser#31

max-mapper commented 9 years ago

Hmm, while I can see the benefit of this in certain situations, it can be considered 'lossy' since an empty field and NULL could mean different things semantically. Kind of like how we added integer parsing support but then reverted it because I decided everything in CSV should be treated as a string https://github.com/maxogden/csv-spectrum/issues/15#issuecomment-146675298

Maybe by that logic then ,, should turn into "", and ,NULL, into "NULL"

calvinmetcalf commented 9 years ago

so normally I'd agree but on the other hand this is the default way sqlserver generates CSVs and somebody complaining about 'why is my text field full of NULL strings is how this came to my attention. I could just filter out all NULL values but since sqlserver does go to the trouble of quoting some things that means if we treat NULL and 'NULL' the same we may be loosing info.

max-mapper commented 9 years ago

Hmm yea that makes sense, you need the parser to do it. Maybe we can make this optional? E.g. an optional test here and an option in the parser

calvinmetcalf commented 9 years ago

sounds sensible