johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
8.93k stars 216 forks source link

Regex support for `reorder` verb #1325

Open osevill opened 1 year ago

osevill commented 1 year ago

Would it be possible to allow regex matching when reordering column headers of a csv file? The documentation describes reorder as requiring the specific field names, e.g., "i" and "b" in mlr --opprint reorder -f i,b data/small

My use case is that I don't necessarily know the exact field names, but I know that some will start with prefix XXX and other with YYY, and I would like to be able to reorder so that any (or 0) fields starting with YYY come first, followed by any (or 0) that start with XXX.

Thanks!

indera commented 1 year ago

An option is to rename the columns if you know the column position, then sort by the name you choose.

See https://miller.readthedocs.io/en/latest/csv-with-and-without-headers/

cat unknown_col.csv
abc, xxx_like, yyy_unlike
10, 1, z
11, 2, y
12, 3, x

Processing

tail -n +2 unknown_col.csv | mlr --csv --implicit-csv-header label a,xxx,yyy then sort -f yyy,xxx

a,xxx,yyy
12, 3, x
11, 2, y
10, 1, z
osevill commented 8 months ago

Thanks for adding this in v6.11!

Test file: reorder_regex_test_2.csv

I'm testing regex support for the reorder verb, and noticing unexpected behavior.

For the attached file, why does this give the expected results: mlr --c2p reorder -f 'aaa_aaa','ccc_aaa','bbb_aaa' ./reorder_regex_test_2.csv

but this doesn't: (changing the -f to -r) mlr --c2p reorder -r 'aaa_aaa','ccc_aaa','bbb_aaa' ./reorder_regex_test_2.csv

In the second expression, column order of the results is 'bbb_aaa' 'aaa_aaa' 'ccc_aaa'

I tried this first and also had unexpected results: mlr --c2p reorder -r '^aaa.*$','^ccc.*$','^bbb.*$' ./reorder_regex_test_2.csv

..with results in a similar column order... '^bbb.*$','^aaa.*$','^ccc.*$'

Providing just one regex expression seems to work fine however: mlr --c2p reorder -r '^aaa.*$' ./reorder_regex_test_2.csv

Am I using incorrect syntax to combine the regex fields? Apologies if I'm missing something obvious.