The N50 is very short from soil sample

lulunisrna commented 4 years ago

Dear voutcn,

I have problem with my result for using MEGAHIT. My result for N50 is very short, around 450-550bp. My sample is from soil plantation. I have reed the same issue from this page and you give an advise for using min--min-count 1, but it doesn't work for me. I also already tried to running assembly with --kmin-1pass, but the result of N50 also too short, around 500bp. Now i'm trying for using--presets meta-large for this assembly. I hope i will get the good result. If my result is still bad, do you have an advice for me to fix this problem? Thank you.

franciscozorrilla commented 4 years ago

Hi, I've also been having a hard time finding suitable parameters for my soil datasets (#254). Did you manage to improve your N50 somehow?

voutcn commented 4 years ago

Soil samples are hard to assemble because of

Very high bio-diversity (too many microorganisms) and a lot of them are sequenced at very low depth
Some dominant microorganisms can be sequenced at extremely high depth which introduces a lot of sequencing error

No solution to the first problem other than sequencing a lot more data. For the second problem normalization may help. See https://github.com/voutcn/megahit/issues/239#issuecomment-534373589

voutcn / megahit

The N50 is very short from soil sample #259