pmelsted / KmerStream

Streaming algorithm for computing kmer statistics for massive genomics datasets
53 stars 11 forks source link

Output TSV optionally #6

Closed sjackman closed 8 years ago

sjackman commented 8 years ago

Much better for importing into R and Python. e.g.

Q   k   F0  f1  F1
0   32  15686875710 12361159570 188878769625
0   64  145132117825    134005267461    161086206625
0   96  120458305249    111221687401    133300308154
0   128 21617343636 18233322390 105522711154
0   160 19369615797 15914061304 77772705236
0   192 46485745358 43797611019 50099054358
0   224 21147873853 20060404949 22586521835
sjackman commented 8 years ago

It would quite useful to add additional columns for F0-f1, G, ek, λ

pmelsted commented 8 years ago

Will do. The first line will be prefixed by '#'.

The remaining fields will be computed by a python script and can be added on.

sjackman commented 8 years ago

Thanks, Pall! I prefer no #. The TSV tools that I use expect a header line.

pmelsted commented 8 years ago

Fixed in 38c20a5

sjackman commented 8 years ago

Just because I had it lying around, here's a script to convert the previous kmerstream output format to TSV.

#!/bin/sh
set -eu -o pipefail
(printf "Q\tk\tF0\tf1\tF1\n"; \
    gsed 's/[^ ]* = //g;s/, /\n/' "$@" | paste -d'\t' - - - - -) \
    | estimate.py /dev/stdin