shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

Add concat option "--del-header". #258

Closed derekmahar closed 9 months ago

derekmahar commented 9 months ago

Please consider adding option --del-header (or similar) to csvtk concat. Consider the scenario of using xargs to concatenate a large number of CSV files:

find . -type f -name "*.csv" |
  sort |
  head -n 10000 |
  (read first
   csvtk concat $first
   xargs csvtk del-header) |
  (read first
   echo $first
   let count=1
   while read line
   do
     if ((count <= 10 || count == 10000))
     then
       echo $line
     fi
     let count=count+1
   done)

Output:

Column
1
2
3
4
5
6
7
8
9
10
10000

If csvtk concat had option --del-header, we could replace xargs csvtk del-header with xargs csvtk concat --del-header:

find . -type f -name "*.csv" |
  sort |
  head -n 10000 |
  (read first
   csvtk concat $first
   xargs csvtk concat --del-header) |
  (read first
   echo $first
   let count=1
   while read line
   do
     if ((count <= 10 || count == 10000))
     then
       echo $line
     fi
     let count=count+1
   done)

Desired output:

Column
1
2
3
4
5
6
7
8
9
10
10000

Prerequisites

Describe your issue

shenwei356 commented 9 months ago

Actually, I do not fully understand what's the purpose of the commands. Can we just

find ./ -name "*.csv" \
    | csvtk concat --infile-list - \
    | csvtk del-header -o result.csv
derekmahar commented 9 months ago

Yes, but I think csvtk concat better describes the intent of the operation. Command csvtk concat --del-header would simply be a synonym of csvtk del-header. Another idea might be to implement csvtk --del-header which would apply to every subcommand.

derekmahar commented 9 months ago

Reading your example again, I realised that csvtk concat --infile-list=- doesn't require xargs and csvtk del-header at all. My scenario overlooked option --infile-list because I was comparing csvtk concat to other tools like xsv and qsv that don't (yet) have an option similar to --infile-list.

shenwei356 commented 9 months ago

The option --infile-list is very useful in cases where the input file list is long.

derekmahar commented 9 months ago

Yes, I agree. This is why I've asked the maintainers of mlr and qsv to add a similar option to each of those tools.

derekmahar commented 9 months ago

I'm closing this issue because it duplicates existing behaviour.

Actually, I do not fully understand what's the purpose of the commands. Can we just

find ./ -name "*.csv" \
    | csvtk concat --infile-list - \
    | csvtk del-header -o result.csv
mbhall88 commented 7 months ago

I would just like to second that having a global option to not output a header would be great. del-header obviously does this, but it would simplify many of my pipelines if there was a global option to just not out the header, thus removing one command from my pipeline

shenwei356 commented 7 months ago

Sounds reasonable and might be also helpful for others. Please create a new issue, in case this one being ignored.

shenwei356 commented 5 months ago

gosh, it's added, it's a lot of work. @derekmahar @mbhall88

  • add a new global flag -U, --delete-header for disable outputing the header row. Supported commands: concat, csv2tab/tab2csv, csv2xlsx/xlsx2csv, cut, filter, filter2, freq, fold/unfold, gather, fmtdate, grep, head, join, mutate, mutate2, replace, round, sample.
$ (echo a; seq 3) | csvtk head -n 3
a
1
2
3

$ (echo a; seq 3) | csvtk head -n 3 -U
1
2
3
derekmahar commented 5 months ago

Thank you for implementing this feature!