Closed mh1111 closed 7 years ago
Hi MH. Could you please provide the command you are using to run the simulator? Thanks.
It looks like there is a reference in your input bed regions that is not present in the reference sequence.
e.g. a line in the bed file could be: chr1 100000 200000, but your input reference fasta does not contain ">chr1".
I agree this is a rather inelegant way to fail. I'll add an error message to alert the user when this is the case, and exit gracefully.
Thank you both for your prompt replies. @morgantaschuk, here is the command I use and the error I get:
$ python genReads.py -r hg19.fa --bam --vcf -R 101 -t target.bed -o test
Using default sequencing error model.
Using default gc-bias model.
found index hg19.fa.fai
reading chr1... 39.507 (sec)
--------------------------------
sampling reads...
Traceback (most recent call last):
File "../genReads.py", line 662, in <module>
main()
File "../genReads.py", line 508, in main
tCov = not(bisect.bisect(inputRegions[refIndex[RI][0]],rInd)%2) or not(bisect.bisect(inputRegions[refIndex[RI][0]],rInd+GC_WINDOW_SIZE)%2)
KeyError: 'chr1'
and target.bed is:
chr2 198257026 198257185
chr2 198257695 198257912
chr2 198260779 198261052
chr2 198262708 198262840
chr2 198263184 198263305
chr2 198264778 198264890
chr2 198264975 198265158
chr2 198265438 198265660
chr2 198266123 198266249
chr2 198266465 198266612
chr2 198266708 198266854
chr2 198267279 198267550
chr2 198267672 198267759
chr2 198268308 198268488
chr2 198269799 198269901
chr2 198269998 198270196
chr2 198272721 198272843
chr2 198273092 198273305
chr2 198274493 198274731
chr2 198281464 198281635
chr2 198283232 198283312
chr2 198283655 198283675
chr2 198285101 198285266
chr2 198285752 198285857
chr2 198288531 198288698
chr2 198299695 198299723
chr3 187440245 187440389
chr3 187442728 187442866
chr3 187443286 187443417
chr3 187444518 187444686
@zstephens, I doubt that what you suggested could be the case here. I have been using this reference fasta for a while and double checked it again:
$ cat hg19.fa | grep -E "chr3|chr2"
>chr2
>chr3
Thanks again for your help folks. MH
It seems that simulator is trying to sample from chr1 while the target region BED does not include chr1, hence the error. When I modify BED file and it starts with a region in chr1, simulator works! It is running now, I will keep you posted if I encountered any problem.
Folks, would you please let me know if you are confident about the performance of your simulator for targeted regions? No matter what the regions in BED file are, simulator starts sampling from chr1!
Best, Mohammad
Greetings,
Unless specified otherwise, the simulator samples both from targeted and non-targeted regions, with a majority of the coverage (98%, I think?) derived from the provided regions. If you wish to have no reads sampled from outside the provided regions, try adding the parameter “-to 0”.
-Zach
On Apr 5, 2017, at 3:07 PM, mh1111 notifications@github.com<mailto:notifications@github.com> wrote:
Folks, would you please let me know if you are confident about the performance of your simulator for targeted regions? No matter what the regions in BED file are, simulator starts sampling from chr1!
Best, Mohammad
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/zstephens/neat-genreads/issues/22#issuecomment-291981427, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIHT1VvsFAALQVNFHRt4R0iGp4qqdjZHks5rs_R4gaJpZM4MzdSh.
Thanks @zstephens. Now it makes sense why the simulator draws samples from all chromosomes. Just to clarify, so if we are targeting a couple of loci in only in chr2, 98% of reads will be sampled from these loci and the remaining 2% of reads will be sampled from all other chromosomes?
MH
My 2 cents as well: if you want to not sample certain chromosomes, then exclude them from the reference FASTA file. That way you can still sample the chromosomes containing targets with the usual 98%/2% (or whatever).
Thanks @lsmainzer. It does the job I want.
Best, MH
Hi folks,
First of all, many thanks for providing this simulator.
I am trying to generate targeted region reads and I get the following error:
Would be grateful if you could please help me to resolve this.
Best, MH