johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
8.99k stars 217 forks source link

Fill empty csv fields with NA (or arbitrary string) #503

Closed BEFH closed 3 years ago

BEFH commented 3 years ago

A lot of csv files look like the following:

col1,col2,col3
,9,0
8,,
9,,3

Empty fields are just left empty, which is problematic when switching to a whitespace delimited file.

What I want to do is to replace empty fields with NA so that a TSV output would look like this:

col1\tcol2\tcol3
NA\t9\t0
8\tNA\tNA
9\tNA\t3

I've tried to do this with --fill-down, but ended up resorting to the following, extremely opaque recursive sed command:

sed 's/^,/NA,/; :a;s/,,/,NA,/g;ta' | sed 's/,$/,NA/'

Is there any pre-existing way to do this on all fields, or would it be easy to add a verb like --replace-missing?

johnkerl commented 3 years ago

I was really sure there already was such a verb -- there definitely should be :D

johnkerl commented 3 years ago

In the meanwhile, here is a DSL workaround:

$ cat input.csv
col1,col2,col3
,9,0
8,,
9,,3
$ cat unempty.mlr
for (k,v in $*) {
    if (v == "") {
        $[k] = "NA"
    }
}
$ mlr --icsv --otsv put -f unempty.mlr input.csv
col1\tcol2\tcol3
NA\t9\t0
8\tNA\tNA
9\tNA\t3
johnkerl commented 3 years ago

https://github.com/johnkerl/miller/pull/504

johnkerl commented 3 years ago

@BEFH closing given the above -- please let me know if this needs to be re-opened & we can do that! :D