shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
999 stars 84 forks source link

[Feature request] join option to add filename as prefix #202

Closed tetedange13 closed 1 year ago

tetedange13 commented 1 year ago

Prerequisites

Describe your issue

I often use csvtk join to produce a matrix, using it on several files with :

I would love to have an option to add each filename as a prefix for all columns that were not used for joining (for now I achieve this by creating for each file a corresponding .tmp with column of interest properly named with each filename, then I csvtk join *.tmp)

Reproducible example

Input files

==> phones1.csv <==
username,number
gri,11111
rob,12345
ken,22222
shenwei,99999

==> phones2.csv <==
username,number
gri,22222
rob,56789
ken,33333
shenwei,77777

==> phones3.csv <==
username,number
gri,33333
rob,98765
ken,44444
shenwei,88888

Expected result

csvtk join --new-option phones{1,2,3}.csv | csvtk pretty

username   phones1.csv-number   phones2.csv-number   phones3.csv-number
--------   ------------------   ------------------   ------------------
gri        11111                22222                33333
rob        12345                56789                98765
ken        22222                33333                44444
shenwei    99999                77777                88888

Each file extension can be kept or removed, as you want => But I guess it is simpler to keep it and let user decide to add | sed '1s/\.csv//g', if he does not want it

csvtk is a wonderful tool, thanks for developping it !

Have a nice day, Felix.

shenwei356 commented 1 year ago

Thanks for the PR. I've just added a little change: do not add prefixes for the key columns, as you requested at the beginning.

$ go build && ./csvtk join  phones{1,2,3}.csv -A  | csvtk pretty 
username   phones1.csv-phone   phones2.csv-phone   phones3.csv-phone
--------   -----------------   -----------------   -----------------
gri        11111               11111               11111
rob        12345               12345               12345
ken        22222               22222               22222
shenwei    999999              999999              999999

$ go build && ./csvtk join  phones{1,2,3}.csv -A -f 2 | csvtk pretty 
phones1.csv-username   phone    phones2.csv-username   phones3.csv-username
--------------------   ------   --------------------   --------------------
gri                    11111    gri                    gri
rob                    12345    rob                    rob
ken                    22222    ken                    ken
shenwei                999999   shenwei                shenwei
avilella commented 1 year ago

This is a very useful new feature, thanks for your efforts.

Could we add an '--interleaved' flag that when added, interleaves the fields in the output? E.g. see below:

file1 field names: a b c d

file2 field names: a b c d

file3 field names: a b c e

Using the --interleaved flag, the output would be the following fields in the order below:

./csvtk join file{1,2,3}.csv -A -f 1 --interleaved | csvtk headers

Output:

a

file1.csv-b

file2.csv-b

file3.csv-b

file1.csv-c

file2.csv-c

file3.csv-c

file1.csv-d

file2.csv-d

file3.csv-e

Thanks for your consideration,

On Tue, Oct 4, 2022 at 11:17 AM Wei Shen @.***> wrote:

Thanks for the PR, I just add a little change: do not add prefix for the key column.

$ go build && ./csvtk join phones{1,2,3}.csv -A csvtk pretty username phones1.csv-phone phones2.csv-phone phones3.csv-phone

gri 11111 11111 11111 rob 12345 12345 12345 ken 22222 22222 22222 shenwei 999999 999999 999999

$ go build && ./csvtk join phones{1,2,3}.csv -A -f 2 csvtk pretty phones1.csv-username phone phones2.csv-username phones3.csv-username

gri 11111 gri gri rob 12345 rob rob ken 22222 ken ken shenwei 999999 shenwei shenwei

— Reply to this email directly, view it on GitHub https://github.com/shenwei356/csvtk/issues/202#issuecomment-1266725235, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSNZVSJPQMYRBKMFKY5TWBP75NANCNFSM6AAAAAAQXSUYFU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

shenwei356 commented 1 year ago

That would be a mess if these files have different numbers of columns.