Open cntrier opened 5 years ago
@cntrier thank you for this comprehensive report! Will make it easy to replicate and hopefully isolate and fix.
@cntrier if you use the parameters given in the examples in the Readme, can you replicate the results shown in the Readme? This information is useful to me as a means of understanding if the bug has always been there, or is a recently introduced bug. Thanks!
Thank you for getting back to me.
I have now downloaded the demo data and ran the same parameters as in the README and encountered the same problem.
I filtered variants and generated a synthetic read model with the code from the README and I have the same output as the README with mitty describe-read-model function: When I generated reads using the synthetic read model with the parameters from the README, again the reads had a flat BQ of 9.
Here is the code:
FASTA=human_g1k_v37.fa.gz
SAMPLEVCF=1kg.20.22.vcf.gz
SAMPLENAME=HG00119
REGION_BED=region.bed
FILTVCF=HG00119-filt.vcf.gz
READMODEL=1kg-pcr-free.pkl
COVERAGE=30
READ_GEN_SEED=7
FASTQ_PREFIX=HG00119-reads
READ_CORRUPT_SEED=8
docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty -v4 filter-variants \
/VCF/${SAMPLEVCF} \
${SAMPLENAME} \
/VCF/${REGION_BED} \
- \
2> vcf-filter.log | bgzip -c > ${FILTVCF}
tabix -p vcf ${FILTVCF}
MODELNAME=synthetic-model.pkl
docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty create-read-model synth-illumina \
/VCF/${MODELNAME} \
--read-length 121 \
--mean-template-length 400 \
--std-template-length 20 \
--bq0 30 \
--k 200 \
--sigma 5 \
--comment 'Model created for the demo'
docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty describe-read-model /VCF/${MODELNAME} /VCF/${MODELNAME}.png
docker run --rm -v /Users/cassandra/Desktop/Ous_mitty:/VCF:rw ous_nbs/mitty /opt/conda/envs/mymitty/bin/mitty -v4 generate-reads \
/VCF/${FASTA} \
/VCF/${FILTVCF} \
${SAMPLENAME} \
/VCF/${REGION_BED} \
/VCF/${MODELNAME} \
${COVERAGE} \
${READ_GEN_SEED} \
/VCF/${FASTQ_PREFIX}1.fq \
/VCF/${FASTQ_PREFIX}-lq.txt \
--fastq2 /VCF/${FASTQ_PREFIX}2.fq \
--threads 2
Thank you for your time!
@cntrier thank you so much for this report - this is super helpful! I will fix this bug.
Hi,
I've created a read model with the following script:
mitty create-read-model synth-illumina \ 100.pkl \ --read-length 100 \ --mean-template-length 250 \ --std-template-length 20 \ --bq0 30 \ --k 200 \ --sigma 5
And when I check the read model in Mitty with the following: mitty describe-read-model 100.pkl 100.png
It looks as expected: But when I generate reads using the model with the following code: k=HG00632 i=100
mitty -v4 generate-reads GRCh38.p12.fa \ ./final_vcfs/${k}_all.vcf.gz \ ${k} all_mergedsorted.bed \ ${i}.pkl \ 40 \ 7 \ ${k}${i}reads-test.1.fq \ ${k}${i}-lq.txt \ --fastq2 ${k}_${i}reads-test.2.fq \ 2> vcf-${i}${k}.log
The generated reads have a flat BQ of 9 when I check them with FastQC:
And when I run the god-aligner to create a bam file, I can see in IGV that the reads are a mess. I've tried running different individuals, different read lengths but get the same pattern.
Have I misunderstood something with the read model generation?
Thank you very much for any help you can provide on the matter.