Closed astewart-twist closed 5 years ago
Greetings, I'm unable to replicate this:
python genReads.py -r chr1_subset_small.fa -R 101 -o test1
python genReads.py -r chr1_subset_small.fa -R 101 -o test2
python genReads.py -r chr1_subset_small.fa -R 101 -o test3
MD5 (test1_read1.fq) = 73b188f3a48bb0e320df6e0c97543b92
MD5 (test2_read1.fq) = c351c251be6a2b44c00236fe8c5cc822
MD5 (test3_read1.fq) = 20fdd92c144c3355058eebec87aa7806
(and manually looking at the .fq files I see the reads are different)
Could you provide me with the command lines you used to generate the datasets you observed to be identical? Were you referring to FASTQ, BAM, of VCF output?
Thanks for the response @zstephens - I indeed get the same result as you with the above (equivalent) example. I believe I've tracked my observation down to a labeling error on my side where the data in question were generated without any errors (but different random seeds) due to other parameters.
By the way - I see CLI arguments for setting error rate modifiers, but is there a way to see the errr rates/distributions of the default models?
I just generated 3 datasets with 3 different random seeds and they all pairwise diff to 0.