Closed aldosr closed 1 month ago
I'm not able to reproduce this error. What version of python are you running? Adding a try/except around that line will break the code downstream, is it depends on the blacklist to avoid collisions.
Other details that might be useful: Can you supply the full output of the command up to the traceback? It would be helpful to see how far it got before hitting this error. The fact that the error happened in one bed file and not the other could be significant. It might be helpful to see those bed files, or at least a few sample lines. It might also be worth seeing if there are any key differences in the fasta files that could be causing the code to trip up, such as differences in the way the chromosomes are named.
Hi Josh, Thanks for the detailed answers. I'm actually re-generating the tumor datasets using a cleaned BED file (I noticed an error in the BED file that could be the cause of the IndexError). As soon as the datasets are ready I will let you know if the problem persists. Thanks again.
Hi again Josh, I generated the datasets again using the cleaned BED file, but the error persists. This is the complete output:
Warning: Read length of error model (149) does not match -R value (150), rescaling model...
Using empirical fragment length distribution.
found index /path/to/hg19.fa.fai
Warning: Reference contains sequences not found in targeted regions BED file.
reading chrM...
0.011 (sec)
--------------------------------
sampling reads...
[]
Read sampling completed in 0 (sec)
reading chr1...
100.239 (sec)
--------------------------------
sampling reads...
[----------]
Read sampling completed in 7379 (sec)
reading chr2...
99.104 (sec)
--------------------------------
sampling reads...
[-------Traceback (most recent call last):
File "/path/to/NEAT/gen_reads.py", line 896, in <module>
main()
File "/path/to/NEAT/gen_reads.py", line 618, in main
all_inserted_variants = sequences.random_mutations()
File "/path/to/NEAT/source/SequenceContainer.py", line 545, in random_mutations
if self.black_list[p][k]:
IndexError: index 18001 is out of bounds for axis 0 with size 18001
Interestingly, another dataset, ran in parallel using a bash script and SLURM, computated until chr5:
reading chr5...
60.400 (sec)
--------------------------------
sampling reads...
[-Traceback (most recent call last):
File "/path/to/NEAT/gen_reads.py", line 896, in <module>
main()
File "/path/to/NEAT/gen_reads.py", line 618, in main
all_inserted_variants = sequences.random_mutations()
File "/path/to/NEAT/source/SequenceContainer.py", line 545, in random_mutations
if self.black_list[p][k]:
IndexError: index 18006 is out of bounds for axis 0 with size 18006
These are the first lines of the input BED:
chr1 11601 11725
chr1 11736 11860
chr1 20506 20638
chr1 29803 29957
chr1 36829 36964
chr1 46836 47053
chr1 95424 95603
chr1 96959 97080
Please let me know if you need the entire BED or other info.
I'm putting @lbeltrame in cc since it's working with me on this.
Check out the new version 4.2 and see if it works for you!
Hi, I'm trying to generate tumor reads using the following parameters:
but I'm having this error:
The mutational model was generated using an assembled vcf from a batch of tumor samples. The fragment length model and the gc bias model were generated from a tumor BAM, extracted from the tumor batch used for the mutational model. The sequencing model was generated from the same sample fastqs. BED file comprises a series of genes of interest.
The error didn't appear when the script was tested with a BED file of a single gene (TP53) and using hg38 as a reference. One solution can be to add a:
try: code except IndexError: continue
But I'd like your input about the error. Thanks a lot for the tool, anyway!