I ran MEGAHIT on a 50x WGS data set (plant genome), and then on the same data set subsampled to 20x. Surprisingly, I got slightly higher N50 and larger assembly size with the 20x data set. BUSCO scores are very similar and very high for both. The same happened for several other similar data sets.
Any ideas or explanations for this?
I ran MEGAHIT like this: megahit -1 reads_1.fq.gz -2 reads_2.fq.gz -r reads_merged.fq.gz,reads_SE.fq.gz -t 30 --min-contig-len 1
I ran MEGAHIT on a 50x WGS data set (plant genome), and then on the same data set subsampled to 20x. Surprisingly, I got slightly higher N50 and larger assembly size with the 20x data set. BUSCO scores are very similar and very high for both. The same happened for several other similar data sets.
Any ideas or explanations for this?
I ran MEGAHIT like this:
megahit -1 reads_1.fq.gz -2 reads_2.fq.gz -r reads_merged.fq.gz,reads_SE.fq.gz -t 30 --min-contig-len 1