sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Exceeding maxkmerlength flag should be stronger #234

Open plattsa opened 9 years ago

plattsa commented 9 years ago

Hi Seb,

I was recently reviewing some work from a lab where a student had diligently sought to find the best Kmer value for a Ray assembly across a large number of genomes. Despite their efforts in exploring different K values, the assemblies had N50 values and other stats that were relatively invariant across their Kmer range of 31 ... 51.

Upon further investigation it turned out that Ray had been compiled without setting an appropriate maximum Kmer value and all the assemblies had really used K=31. As each assembly had slightly different stats (so the assemblies didn't look identical) and the line stating the Kmer used was located in the middle of a large amount of output, the fact that the requested and used Kmer were different was not spotted.

I'd like to suggest that requesting a Kmer value beyond the range supported by the compilation result in an execution error rather than a note in the log.

Adrian

sebhtml commented 9 years ago

I am not working actively on Ray anymore. If you submit a patch, I can review it and merge it.

Our new assembler project is called "spate". One of the cool stuff in spate is that there is no maximum k-mer length to configure at compilation time.

See reasons here: http://permalink.gmane.org/gmane.science.biology.ray-genome-assembler/904

Spate is still in development though.

https://github.com/sebhtml/biosal