sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

add scaffolder metrics #117

Open sebhtml opened 11 years ago

sebhtml commented 11 years ago

That's a good point.

A good metric that Ray could produce to start with would be the number of pairs (including mates) with:

  1. both ends within a contig;
  2. one end on one contig and the other end on another contig
  3. one end on one contig and the other not mapped
  4. both ends not mapped

You suggest that a sizable part of the pairs (including mates) arein

  1. and 4. when using a k-mer length of 61-91. That's likely.

I think it is probably the case as mate pairs usually include also an adapter too, and that consume previous space in the sequences.

For the time being, I believe that "use another scaffolder" is your best bet.

Speaking of scaffolders, I will soon (hopefully) fix the speed issue for scaffolding of large genomes due to repeated k-mers [1].