matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
74 stars 18 forks source link

overlap graph (perhaps as part of `islands`) #214

Closed matsen closed 12 years ago

matsen commented 12 years ago

We would like to spit out the weighted overlap graph (unweighted version described in mass_islands.ml).

That is, if the placement distributions of reads i and j overlap (i.e. share an edge) then they get connected in the graph. The weight assigned to that edge is

\sum_e d(i)_e d(j)_e

where d(i)_e is the probability mass of read i on edge e (and the sum is over all edges, but we clearly only have to look at edges where both are nonzero).

The output should be in ABC format, i.e.

"The input is then a file or stream in which each line encodes an edge in terms of two labels (the 'A' and the 'B') and a numerical value (the 'C'), all separated by white space."

So something like

read1    read2    0.3
read3    read4    0.4

We will then want to feed these graphs to the mcl binary-- please check to make sure that it properly executes using the guppy output. I haven't installed it yet (though it's avail using apt-get for Debianesque systems).

I would be happy for this to be its own subcommand: guppy ograph, or a flag to islands, though that might get a bit complex if this thing is going to have flags of its own.

MCL site

matsen commented 12 years ago

@koadman, let me know if this doesn't seem right!