shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
1.01k stars 84 forks source link

Handling empty field when joining tables #208

Closed MostafaYA closed 1 year ago

MostafaYA commented 1 year ago

Prerequisites

Describe your issue

echo -ne "1\t2\t3\t4\na\tb\n" > file1
echo -ne "1\t2\t3\t4\na\tb\tc\n" > file2
echo -ne "1\t2\t3\t4\na\tb\tc\td\n" > file3
 csvtk join -t file1 file2 file3
[ERRO] record on line 2: wrong number of fields

desired output

1       2       3       4
a       b       -       -
a       b       c       -
a       b       c       d

Thank you

I'm grateful to users who have greatly helped to report bugs and suggested new features.

I may respond to issues or fix bugs quickly, but I usually implement new features periodically (two or more weeks).

shenwei356 commented 1 year ago

The result of echo -ne "1\t2\t3\t4\na\tb\n" is not a valid TSV file.

If they are, use csvtk concat.

shenwei356 commented 1 year ago

You can fix the TSV file with code from @lskatz :

for i in *.tsv; do \
    tabs=$(cat $i | perl -F'\t' -lane 'print(scalar(@F))' | sort | uniq | sort -nr | head -n 1); \
    cat $i | perl -F'\t' -lane 'while(@F < '$tabs'){push(@F,"");} print join("\t", @F);' > tmp.tsv && mv -v tmp.tsv $i; \
done;
shenwei356 commented 1 year ago

The new csvtk fix is v0.26.0 can also fix this.

$ echo -ne "1\t2\t3\t4\na\tb\n" | csvtk fix  -t | csvtk pretty -Ht -S bold
[INFO] the maximum number of columns in all 2 rows: 4
┏━━━┳━━━┳━━━┳━━━┓
┃ 1 ┃ 2 ┃ 3 ┃ 4 ┃
┣━━━╋━━━╋━━━╋━━━┫
┃ a ┃ b ┃   ┃   ┃
┗━━━┻━━━┻━━━┻━━━┛