neufeld / pandaseq

PAired-eND Assembler for DNA sequences
GNU General Public License v3.0
129 stars 24 forks source link

About FML issues #37

Closed fangly closed 10 years ago

fangly commented 10 years ago

Dear Andre,

I am reading the pandaseq manual, and just learned about FML error. Now that I know to look for them, I notice that my dataset is affected. I would like to suggest two improvements: 1/ It takes efforts to find out that there are FML problems. Could you possibly add a warning when this occurs so that users know that things are not optimal? 2/ When trying to increase k, what should be a good starting value? The manual does not seem to mention what the default value is...

Thank you,

Florent

apmasell commented 10 years ago
  1. High numbers of FML errors do not, necessarily, indicate a problem. That's why there's no warning. It's also the case that if it occurs in many sequences, it doesn't indicate a problem. If it occurs many times per sequence, it might indicate a problem. If it occurs many time per sequences for many sequences, then it might be a problem worth investigating.
  2. The default value is 2. I'll update the manual. Use pandaseq-diff to see if changing the k-mer table size improves the output. Unless the sequence is extremely repetitive, it usually doesn't matter.
fangly commented 10 years ago

Got you. In practice, I had to go to K=36 to remove all FML problem (starting from ~8,000 at K=2). And really, this did not significantly alter the number of merged pairs. So, at least in my case, having a warning is unwarranted. Thanks for the explanations, Andre. Florent

apmasell commented 10 years ago

No problem.