Open aragilar opened 1 year ago
Ah, the trick is to use unsparsify, it would be good if this was more widely mentioned (and possibly if this was an option when passing multiple files).
Ah, the trick is to use unsparsify, it would be good if this was more widely mentioned (and possibly if this was an option when passing multiple files)
Hi @aragilar it's not a trick, it's a standard feature and you have the option to use it with multiple files.
Miller manages natively the Record-heterogeneity, and its standard format is not rectangular.
@aragilar I think you can close this
I think the main issue is it's kind of a foot gun in that the ordering of files results in the files being parsed differently. Ideally there'd be some warning that the headers are inconsistent, and to use unsparsify to clean up the initial set of files, but at least having a callout about this in the CSV sections (and also presumably for similar formats which assume homogeneity) would be better than users flailing about and questioning if miller is working correctly.
I suggested to close it, because you had found out how to do it.
Ok, but you are thinking about a feature request, then I'm tagging @johnkerl
It seems miller isn't able to concatenate CSV files with a varying number of columns. The best description of this is this stack overflow question: https://stackoverflow.com/questions/68090301/merging-multiple-csvs-with-different-columns
What appears to happen is if Year2002.csv is first, then the headers of the later files are included as if you ran
cat
, rather than Year2002.csv having blank columns.