shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

Calculate the frequency of values in a single column #253

Closed MostafaYA closed 11 months ago

MostafaYA commented 11 months ago

Prerequisites

Describe your issue

example1.txt example2.txt

Hi, I am attaching two examples for you. My aim to to calculate the frequency of the value (country in the examples). I do the following to summarize the frequency. This is mostly relying on col 1.

cat example1.txt | 
 csvtk summary -t -g Country -f Sample:count  | 
 csvtk mutate2 -t -n country_freq -e ' $Country + " (" + ${Sample:count} + ")" '  | 
 csvtk pretty  -t 
Country          Sample:count   country_freq         
--------------   ------------   ---------------------
Argentina        4              Argentina (4)        
Australia        86             Australia (86)       
Austria          2              Austria (2)          
Belgium          34             Belgium (34)         
Brazil           1              Brazil (1)           
Bulgaria         2              Bulgaria (2)         
Canada           427            Canada (427)         
Chile            5              Chile (5)            
China            42             China (42)           
Costa Rica       35             Costa Rica (35)      
Czech Republic   1              Czech Republic (1)   
Denmark          1              Denmark (1)          
France           20             France (20)          
Germany          139            Germany (139)        
Greece           2              Greece (2)           
Hong Kong        13             Hong Kong (13)       
Hungary          2              Hungary (2)          
Iceland          1              Iceland (1)          
Iran             3              Iran (3)             
Ireland          29             Ireland (29)         
Italy            32             Italy (32)           
Japan            22             Japan (22)           
Jersey           1              Jersey (1)           
Kuwait           3              Kuwait (3)           
Mexico           1              Mexico (1)           
NA               94             NA (94)              
Netherlands      85             Netherlands (85)     
New Zealand      1              New Zealand (1)      
Norway           1              Norway (1)           
Poland           15             Poland (15)          
Portugal         1              Portugal (1)         
Romania          8              Romania (8)          
Singapore        8              Singapore (8)        
Slovakia         1              Slovakia (1)         
Slovenia         2              Slovenia (2)         
South Korea      21             South Korea (21)     
Spain            7              Spain (7)            
Sweden           12             Sweden (12)          
Switzerland      3              Switzerland (3)      
Taiwan           7              Taiwan (7)           
United Kingdom   3512           United Kingdom (3512)
United States    923            United States (923)  

In example2, I do have only one column and the above code will not work. any suggestion?

Thank you

I'm grateful to users who have greatly helped to report bugs and suggested new features.

I may respond to issues or fix bugs quickly, but I usually implement new features periodically (two or more weeks).

shenwei356 commented 11 months ago
$ csvtk freq example2.txt -k | csvtk  pretty 
Country          frequency
--------------   ---------
Argentina        4        
Australia        86       
Austria          2        
Belgium          34 
MostafaYA commented 11 months ago

Sorry for the oversight. I didn't notice this cmd at all.