shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

New feature: csvtk join to prefix only duplicated column names #246

Closed stas-malavin closed 11 months ago

stas-malavin commented 11 months ago

Prerequisites

Describe your issue

I would like to have the possibility to prefix only duplicated column names. The second and third examples below showcase the desired effect of the imaginary --prefix-duplicates flag.

$ csvtk join --prefix-filename \
    <(echo -e 'name,attr,tool\nStas,stubborn,tube\nShenwei,smart,computer') \
    <(echo -e 'name,attr,food\nStas,microbiologist,sandwich\nShenwei,computer scientist,noodles') | \
    csvtk pretty
name      63-attr    63-tool    62-attr              62-food 
-------   --------   --------   ------------------   --------
Stas      stubborn   tube       microbiologist       sandwich
Shenwei   smart      computer   computer scientist   noodles 

$ csvtk join --prefix-filename --prefix-duplicates\
    <(echo -e 'name,attr,tool\nStas,stubborn,tube\nShenwei,smart,computer') \
    <(echo -e 'name,attr,food\nStas,microbiologist,sandwich\nShenwei,computer scientist,noodles') | \
    csvtk pretty
name      63-attr    tool       62-attr              food 
-------   --------   --------   ------------------   --------
Stas      stubborn   tube       microbiologist       sandwich
Shenwei   smart      computer   computer scientist   noodles 

$ csvtk join --prefix-duplicates\
    <(echo -e 'name,attr,tool\nStas,stubborn,tube\nShenwei,smart,computer') \
    <(echo -e 'name,attr,food\nStas,microbiologist,sandwich\nShenwei,computer scientist,noodles') | \
    csvtk pretty
name      attr       tool       attr1                food 
-------   --------   --------   ------------------   --------
Stas      stubborn   tube       microbiologist       sandwich
Shenwei   smart      computer   computer scientist   noodles 

It may also be a default behavior. Thank you!

shenwei356 commented 11 months ago

Added a new flag -P/--prefix-duplicates, but it needs to be used along with --prefix-filename`.

$  csvtk join --prefix-filename --prefix-duplicates\
    <(echo -e 'name,attr,tool\nStas,stubborn,tube\nShenwei,smart,computer') \
    <(echo -e 'name,attr,food\nStas,microbiologist,sandwich\nShenwei,computer scientist,noodles') | \
    csvtk pretty
name      attr       tool       62-attr              food    
-------   --------   --------   ------------------   --------
Stas      stubborn   tube       microbiologist       sandwich
Shenwei   smart      computer   computer scientist   noodles 
stas-malavin commented 11 months ago

You're blasting-fast, thank you! :)

stas-malavin commented 11 months ago

It works, confirm