shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
999 stars 84 forks source link

new comand to fix TSV/CSV #226

Closed shenwei356 closed 1 year ago

shenwei356 commented 1 year ago

https://gist.github.com/shenwei356/87c997eeb63a589190cc37f3a820266f

# edit from Lee Katz (https://github.com/lskatz)
for i in *.tsv; do \
    tabs=$(perl -F'\t' -ane '$n=scalar(@F); $N=$n if $n>$N; END{print $N}' $i); \
    perl -F'\t' -lane 'while(@F < '$tabs'){push(@F,"");} print join("\t", @F);' $i > tmp.tsv && mv -v tmp.tsv $i; \
done;
shenwei356 commented 1 year ago

fix

Usage

fix CSV/TSV with different numbers of columns in rows

How to:
  1. First -n/--buf-rows rows are read to check the maximum number of columns.
     The default value 0 means all rows will be read.
  2. Buffered and remaining rows with fewer columns are appended with empty
     cells before output.
  3. An error will be reported if the number of columns of any remaining row
     is larger than the maximum number of columns.

Usage:
  csvtk fix [flags]

Flags:
  -n, --buf-rows int   the number of rows to determine the maximum number of columns. 0 for all rows.
  -h, --help           help for fix

Examples

$ cat testdata/unequal_ncols.csv
id,first_name,last_name
11,"Rob","Pike"
2,Ken,Thompson
4,"Robert","Griesemer","gri"
1,"Robert","Thompson","abc"
NA,"Robert"

$ cat testdata/unequal_ncols.csv | csvtk pretty
[ERRO] record on line 4: wrong number of fields

$ cat testdata/unequal_ncols.csv | csvtk fix | csvtk pretty -S grid
[INFO] the maximum number of columns in all 6 rows: 4
+----+------------+-----------+-----+
| id | first_name | last_name |     |
+====+============+===========+=====+
| 11 | Rob        | Pike      |     |
+----+------------+-----------+-----+
| 2  | Ken        | Thompson  |     |
+----+------------+-----------+-----+
| 4  | Robert     | Griesemer | gri |
+----+------------+-----------+-----+
| 1  | Robert     | Thompson  | abc |
+----+------------+-----------+-----+
| NA | Robert     |           |     |
+----+------------+-----------+-----+