merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

anvi-gen-variability-profile should have gene_length column #508

Closed ekiefl closed 7 years ago

ekiefl commented 7 years ago

Currently, there is some gene information provided in the text file output of anvi-gen-variability-profile (corresponding_gene_call, in_partial_gene_call, in_complete_gene_call, base_pos_in_codon, and codon_order_in_gene). It would be convenient to have a gene_length column for calculating SNV density metrics. Currently, the workaround is doing some data wrangling on the -gene_calls.txt file output from anvi-summarize. If the SNV isn't annotated by a partial or complete gene annotation, the gene_length element should be -1 or 0.

meren commented 7 years ago

The reason we didn't have it in the output table all this time was because the extent of redundancy it will introduce (each gene with reported variation appears at least as many times as there are samples in this output, and the length of genes do not change). Reporting SNV density per gene would have been less redundant (since it would change from sample to sample, but then it the value would appear as many times as the SNVs reported for a given gene).

Do you want the gene length and SNV density in the output despite these? Or do you have a better suggestion? I also hate that recovery of SNV densities require some data wrangling which is much more error-prone than just reporting them.

ekiefl commented 7 years ago

Gene length would be very useful to have, regardless of its redundancy. I don't know if I agree with reporting SNV density unless the user is being mindful, since it will depend on the flags used in anvi-gen-variability-profile. For example, if run with --min-coverage-in-each-sample, the interpretation of SNV density changes drastically. Gene length is a more objective value.

meren commented 7 years ago

It would be great if you can test this from an up-to-date master, and confirm that it is not making stuff up :)

ekiefl commented 7 years ago

On Sun, Apr 30, 2017 at 4:02 PM, A. Murat Eren notifications@github.com wrote:

It would be great if you can test this from an up-to-date master, and confirm that it is not making stuff up :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/merenlab/anvio/issues/508#issuecomment-298256906, or mute the thread https://github.com/notifications/unsubscribe-auth/AISUGYxhwG2QxzKyetw2xlxFCp3PzhTxks5r1PbrgaJpZM4NMm06 .

meren commented 7 years ago

This response is empty, Evan. Responding to GitHub issues through e-mail often does not work as intended. I suggest sticking with the web interface.

ekiefl commented 7 years ago

http://imgur.com/DrguXV8 (Github will not upload screenshot of my terminal so I hosted on imgur)

Tested and all is well. Except for the added gene_length column, the generated SNV table is identical to the one generated from the previous version. Also, the gene_length column is identical to the gene_length column I created with a work-around method.

meren commented 7 years ago

Perfect! Thank you for looking into that.