tseemann / snippy

:scissors: :zap: Rapid haploid variant calling and core genome alignment
GNU General Public License v2.0
474 stars 115 forks source link

real reads versus simulated reads #398

Open fengyuchengdu opened 4 years ago

fengyuchengdu commented 4 years ago

Hi Torsten,

Lots of bacterial genomes are lacking SRA data, preventing us from performing several reads-based analysis. I think you mentioned somewhere that "best reads are contigs", and do you reckon simulated reads generated from the assembly (perfect reads, no error, perfectly even coverage) should be used in Snippy even if we do have the real reads.

I've run some tests to see if they are significantly different (simulated reads were generated from Shovill-assembled genome with the same read length and coverage as that of the real reads) and the answer is "yes". I found more variants including SNP were detected by Snippy using simulated reads compared with using the real reads. So it makes me wondering which one is closer to the truth.

Thanks

Yu

tseemann commented 4 years ago

Shredding draft genomes can be a problem yes. Did you do it yourself, or use snippy --ctgs ?

fengyuchengdu commented 4 years ago

I tried both snippy --ctgs and wgsim wrapper readsimulator.py (https://github.com/wanyuac/readSimulator) command to generate perfect reads: readSimulator.py --input input.fasta --simulator wgsim --simulator_path /home/zong/anaconda3/envs/py36/bin/wgsim --depth 100 --outdir simulated_reads --readlen 150 --opts '-e 0 -d 350 -r 0 -R 0 -X 0 -h -S 0'

the number of variants snippy --ctgs <= using real reads <= wgsim-shredded reads