shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

new command "scatter" #265

Closed VladimirAlexiev closed 6 months ago

VladimirAlexiev commented 6 months ago

Can you add a command that's opposite of "gather"?

I have a file like this:

module    t                  c
-------   ----------------   -
address   Class              1
address   DatatypeProperty   2
address   ObjectProperty     3
agent     DataProperty       4

I want to convert it to:

module    Class   DatatypeProperty    ObjectProperty
-------   -----   ----------------    --------------
address   1       2                   3
agent                                 4

I don't know what would be an appropriate name, maybe "scatter"? Could be invoked like this

csvtk scatter --key t --value c -f Class,DatatypeProperty,ObjectProperty

where -f is an OPTIONAL list of key values to be used for sorting the output columns.


I don't have a useful case for multiple key or value columns, but I guess that is possible . Eg from

gender education number percent
male   basic
female highschool
...

to something like

male_basic_number male_highschool_number female_basic_number female_highschool_number

Multiple --key make sense only for a very few values in the key columns.

shenwei356 commented 6 months ago

There's a 'spread': https://bioinf.shenwei.me/csvtk/usage/#spread

VladimirAlexiev commented 6 months ago

Uh-oh, I had 0.24. Thanks! After upgrading, I get exactly what I need:

csvtk space2tab test.txt|csvtk spread -t -k t -v c|csvtk pretty -t
module    Class   DatatypeProperty   ObjectProperty
-------   -----   ----------------   --------------
address   1       2                  3
agent             4
VladimirAlexiev commented 5 months ago

hi @shenwei356 Does it make sense to add scatter as a synonym of spread? scatter is a better match for gather: it even rhymes :-)

shenwei356 commented 5 months ago

Not really :). scatter sounds like the scatter plot.

gather and spread are from the R package tidyr. They perform opposite operations.

$ csvtk -h
Commands for Data Transformation:
  fold            fold multiple values of a field into cells of groups
  gather          gather columns into key-value pairs, like tidyr::gather/pivot_longer
  sep             separate column into multiple columns
  spread          spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider
  transpose       transpose CSV data
  unfold          unfold multiple values in cells of a field
VladimirAlexiev commented 5 months ago

@shenwei356 Yes: gather != spread = scatter.

"spread" and "scatter" mean the same (in this context", and "scatter" rhymes better with "gather". I don't know tidyr, that's why I guessed there should be "scatter" as the opposite of "gather".

shenwei356 commented 5 months ago

Does it make sense to add scatter as a synonym of spread?

OK. But spread remains the main name, for consistence with tidyr, a popular R package widely used for table manipulation.

VladimirAlexiev commented 5 months ago

sure!

shenwei356 commented 5 months ago

done.