shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

[Feature request] customized suffixes or prefixes in join command #263

Closed Jesson-mark closed 8 months ago

Jesson-mark commented 8 months ago

Hi, Wei, thanks for your great work on csvtk! It is a really great software that I use it everyday!

I am wondering if there is a new option which can provide customized suffixes or prefixes when using join to merge two files. I know --prefix-filename and --prefix-trim-ext can add each filename as a prefix to each colname, but in some instances my file may contain a long string, which may not be proper if there are added to new columns (these columns are too long to read or distinguish). In such case, I only want to add two distinct strings(labels) to the combined (merged) columns.

That will be great if an option which can supply customized suffixes or prefixes is added.

Thanks again for your great work!

shenwei356 commented 8 months ago

Please show some simple examples.

Jesson-mark commented 8 months ago

Sorry for my ambiguous explanation above. Below are some simple examples:

Suppose there are two files named phones.csv and region.csv (the same examples as join command)

$ cat phones.csv 
username,phone
gri,11111
rob,12345
ken,22222
shenwei,999999

$ cat region.csv 
name,region
ken,nowhere
gri,somewhere
shenwei,another
Thompson,there

When joinning them, adding --prefix-filename and --prefix-trim-ext options, the results are:

$ csvtk join -f 1 phones.csv region.csv --prefix-filename --prefix-trim-ext
username,phones-phone,region-region
gri,11111,somewhere
ken,22222,nowhere
shenwei,999999,another

If there is a new option, eg: --suffix "label1,label2", where label1 is added to columns of file1 and label2 is added to columns of file2, it will becomes:

$ csvtk join -f 1 phones.csv region.csv --suffix "A,B"
username,phone-A,region-B
gri,11111,somewhere
ken,22222,nowhere
shenwei,999999,another

Now A and B are added to the new columns, which is more readable than previous outputs because it is highly customizable without modifying the filename.

The option (--suffix) is just like the suffix parameter in left_join function of dtplyr package, but csvtk is more convenient than dtplyr since usng the latter requires writing a little scripts.

shenwei356 commented 8 months ago

Added:

$ csvtk join -f 1 phones.csv region.csv --suffix "A,B"  | csvtk pretty 
username   phone-A   region-B 
--------   -------   ---------
gri        11111     somewhere
ken        22222     nowhere  
shenwei    999999    another 
Jesson-mark commented 8 months ago

Thank you for your prompt reply and modifications to csvtk! I have great respect for that!