matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
74 stars 18 forks source link

subcommand to generate jplace file from OTU table #215

Closed matsen closed 12 years ago

matsen commented 12 years ago

QIIME's basic data structure is an OTU table. This data structure holds the results from a collection of samples, in terms of OTU structure with taxonomic information and numbers of reads per OTU.

The best way to get started is probably to go through the QIIME tutorial to step 7. At this point we should have a tree and an OTU table. We would like to turn this into two files. First, a .jplace file with a collection of fake placements, and second, a CSV file mapping read names to specimens. The combination of these two should give an appropriate split placefile. Note what I called a specimen just now is called a "split placefile name" in the pplacer docs and "sample IDs" in the QIIME documentation.

There is something a little funny here, which is that the read names are actually lost in the generation of the OTU table. So we will have to make some up. I suggest <otu number>__<sample ID>.

For each non-empty combination of OTU number and sample ID, a placement should be generated on the pendant edge labeled by the OTU that has zero pendant branch length and zero distal length with the fake "read name" described above. All of these will get put in a .jplace file. Then, the mapping of "read name"s to sample IDs will go in a CSV file.

@metasoarous, could you please go through enough of the tutorial to make a tree and an OTU table using the QIIME virtual box. Then pass these files off to @habnabit and reassign the issue to him.

matsen commented 12 years ago

I was originally thinking that this would be a script, but I think it should be guppy of_otu_table.

metasoarous commented 12 years ago

OTU table output of the QIIME tutorial is located at:

/home/matsengrp/working/csmall/otus.tgz