shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
1.01k stars 84 forks source link

Feature request: cross-tabulation #216

Closed Gavin-Holt closed 1 year ago

Gavin-Holt commented 1 year ago

Hi,

I am old enough to remember the introduction of crosstabs in Excel (also known a contingency tables). At the time it was beyond most spreadsheets, and difficult in desktop databases.

Would it be possible to compute two variable cross-tabulations in CSVtk? e.g.

csvtk.exe xtab --rows hair_color --cols sex --calc percent --totals_on --color_off --stats chi_sq

The tables alone would very helpful, just to present data.

Adding stats and colouration of outliers, would be sensational.

Many thanks for csvtk, I love having alternatives to Excel/SQL.

Kind Regards Gavin Holt

Gavin-Holt commented 1 year ago

Hi

As an alternative, I have discovered that Datamash will play nicely, if you sort the fields first (datamash automatic sorting -s doesn't work on Windows):

REM Fragment of my windows batch file
REM     Pretty output right justified for currency.
cat.exe _Analysis%infile%.csv ^
 | csvtk.exe cut -f 12,13,9 ^
 | csvtk.exe sort -k 1,2 ^
 | datamash.exe  --field-separator=, --header-in --format="$%%.2f" crosstab 1,2 sum 3 ^
 | csvtk.exe pretty -r ^
 >> _Analysis%infile%.txt

REM Convert $ to £ for the UK
sed.exe -i s/\$/\xA3/g _Analysis%infile%.txt

Kind Regards Gavin Holt