zjshi / gt-pro

MIT License
23 stars 7 forks source link

gtp_parse.py output columns #35

Closed nick-youngblut closed 3 years ago

nick-youngblut commented 4 years ago

There's no header generated for the gtp_parse.py output table, and I don't see a description of the output table columns in the README or wiki. What are the columns? It would be helpful to include a description of the columns in the gtp_parse.py help docs and/or the gt-pro repo docs.

wwood commented 3 years ago

+1

zhaoc1 commented 3 years ago

Hi @nick-youngblut, apology for the late reply. While we work on the improving the documentation of gt-pro, I would like to explain the 8 output columns of the gtp-paser.py:

104644  2362409 NODE_941_length_29304_cov_26.5595   22202   G   A   20  0
104644  2362412 NODE_941_length_29304_cov_26.5595   22205   C   T   20  0
104644  2383325 NODE_990_length_27497_cov_21.326    13814   G   T   0   1
104644  2394405 NODE_990_length_27497_cov_21.326    24894   C   T   1   0

col 1: the unique 6-digit species id col 2: 0-based global genomic position across contigs col 3: contig_id for the representative genome in the gt-pro k-mer database col 4: 0-based genomic position within current contig_id col 5: gt-pro db major allele for current genomic position from current contig (note this is not necessarily the REF allele) col 6: gt-pro db minor allele col 7: read counts carrying major alleles col 8: read counts carrying minor alleles

Please let me know if you have more questions. Thank you..

Chunyu

nick-youngblut commented 3 years ago

I was just looking through the docs and couldn't find a description of the output columns. I almost forgot that the description was in this issue. It could be helpful to include this info in the README or tutorial

zjshi commented 3 years ago

Sorry for the late reply. Yes, we agree! We now added the description of such in the main docs.