Open giobus75 opened 1 month ago
Currently working on this. Will post a fix/new version soon!
One thing I'm noticing is that it's stumbling on the newest genome assembly, because of the inclusion of characters other than A, C, G, T, N. We will have to update the code to generalize these alternate characters, but currently we're unclear the best way to handle those is. It might be worth trying to replace non ACTG with N and see if that resolves part of the problem. I think we still have a stray indexing error, though, that crops up sometimes.
Thank you for your fast response. I'll try to follow your workaround replacing ACTG with N.
Hi, I replaced not-ACTGN chars with N but it still returns an Index out of range
error.
I used this code to replace chars:
fn = "../references/GRCh38_mod_with_N.fa"
out_fn = "../references/GRCh38_mod_with_N_replaced.fa"
with open (fn) as fd:
buff = fd.readlines()
new_buff = []
for i, l in enumerate(buff):
if "chr" not in l and "HLA" not in l:
l_upper = l.upper()
l = l_upper.translate(str.maketrans({'a': 'A', 'g': 'g', 'c': 'C', 't': 'T', 'M': 'N', 'R': 'N', 'Y': 'N', 'W': 'N', 'B': 'N', 'S': 'N', 'K': 'N'}))
new_buff.append(l)
with open(out_fn, "w") as fd:
for i, l in enumerate(new_buff):
fd.write(l)
Then I ran the simulation with the modified reference file:
neat read-simulator -c neat_config.yml -o simulated_stuff
And I got:
2024-10-08 15:08:23,709:INFO:neat.read_simulator.utils.generate_variants:Added 144796 mutations to chr8
2024-10-08 15:08:23,709:INFO:neat.read_simulator.utils.generate_reads:Sampling reads...
2024-10-08 16:13:45,016:ERROR:neat:read-simulator failed, see the traceback below
Traceback (most recent call last):
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/cli.py", line 131, in main
cmd(args)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
read_simulator_runner(arguments.config, arguments.output)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 314, in read_simulator_runner
read1_fastq_paired, read1_fastq_single, read2_fastq_paired, read2_fastq_single = generate_reads(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 345, in generate_reads
read_1.finalize_read_and_write(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 334, in finalize_read_and_write
self.errors, self.padding = err_model.get_sequencing_errors(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/models/error_models.py", line 241, in get_sequencing_errors
snv_reference = reference_segment[index]
File "/opt/conda/envs/neat/lib/python3.10/site-packages/Bio/Seq.py", line 430, in __getitem__
return chr(self._data[index])
IndexError: index out of range
ERROR: read-simulator failed, showing the last error
Traceback (most recent call last):
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/cli.py", line 131, in main
cmd(args)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
read_simulator_runner(arguments.config, arguments.output)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 314, in read_simulator_runner
read1_fastq_paired, read1_fastq_single, read2_fastq_paired, read2_fastq_single = generate_reads(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 345, in generate_reads
read_1.finalize_read_and_write(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 334, in finalize_read_and_write
self.errors, self.padding = err_model.get_sequencing_errors(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/models/error_models.py", line 241, in get_sequencing_errors
snv_reference = reference_segment[index]
File "/opt/conda/envs/neat/lib/python3.10/site-packages/Bio/Seq.py", line 430, in __getitem__
return chr(self._data[index])
IndexError: index out of range
All right. I will look into this!
Still working on this. It turned out to have a few subtlties that are tricky. We will try to post a fix this week.
Thanks so much! I really appreciate it!
All right, I'm not yet able to reproduce this error with the latest version. It's possible it was related to another bug I fixed in the sequencing error section, since that is the part throwing the error here, so maybe it is fixed, if you want to try the latest version and let us know the results.
My test was to run specifically on chr8 from that same file. It was taking too long to run the whole genome, for me. So it's possible there's some other issue still.
Hi,
I ran another read simulation using the v4.2.6 tag, but unfortunately, I'm still encountering an "Index out of range" error. I've attached the log from the run in case it helps diagnose the issue.
The error occurs after more than two days of simulation, which makes troubleshooting quite time-consuming. Is there a faster way to check the code or perhaps an option that I might have overlooked?
Okay, We did have a small bug that we fixed in 4.2.7 but it was related to quality scores. this is failing before that step. I will try again to replicate this.
Hi, I tried the 4.2.7 and, running the same simulation, the error occured earlier (less than 4 hours).
The log: 1730662388.9120936_NEAT.log
All right, I will dive into that this week, hopefully.
Hi, I tried the 4.2.7 and, running the same simulation, the error occured earlier (less than 4 hours).
The log: 1730662388.9120936_NEAT.log
I too am having this same error with the 4.2.7 version:
2024-11-16 11:56:14,001:INFO:neat.common.logging:writing log to: {base_dir}/NEAT/1731779766.7208364_NEAT.log
2024-11-16 11:56:14,011:INFO:neat.read_simulator.runner:Using configuration file paired75.yml
2024-11-16 11:56:14,012:INFO:neat.read_simulator.runner:Saving output files to .
2024-11-16 11:56:14,015:INFO:neat.read_simulator.utils.options:Run Configuration...
2024-11-16 11:56:14,015:INFO:neat.read_simulator.utils.options:Input fasta: {base_dir}/NEAT-chimeric/reference_files/chr18_smallest.fa
2024-11-16 11:56:14,015:INFO:neat.read_simulator.utils.options:Producing the following files:
- {base_dir}/NEAT/paired_75_x10_r1.fastq.gz
- {base_dir}/NEAT/paired_75_x10_r2.fastq.gz
2024-11-16 11:56:14,015:INFO:neat.read_simulator.utils.options:Single threading - 1 thread.
2024-11-16 11:56:14,015:INFO:neat.read_simulator.utils.options:Using a read length of 75
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:Generating fragments based on mean=300, stand. dev=100
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:Running in paired-ended mode.
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:Average coverage: 10
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:Using default error model.
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:User defined average sequencing error rate: 0.001.
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:Ploidy value: 1
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:Custom average mutation rate for the run: 0.01
2024-11-16 11:56:14,016:INFO:neat.read_simulator.utils.options:RNG seed value for run: 4617953855961099
2024-11-16 11:56:14,016:INFO:neat.read_simulator.runner:Reading Models...
2024-11-16 11:56:14,017:INFO:neat.read_simulator.runner:Reading {base_dir}/NEAT-chimeric/reference_files/chr18_smallest.fa.
2024-11-16 11:56:14,994:INFO:neat.read_simulator.runner:Beginning simulation.
2024-11-16 11:56:15,226:INFO:neat.read_simulator.runner:Generating variants for chr18
2024-11-16 12:53:06,964:INFO:neat.read_simulator.utils.generate_variants:Finished generating random mutations in 56.86 minutes
2024-11-16 12:53:06,991:INFO:neat.read_simulator.utils.generate_variants:Added 113797 mutations to chr18
2024-11-16 12:53:06,991:INFO:neat.read_simulator.utils.generate_reads:Sampling reads...
2024-11-16 12:56:24,853:ERROR:neat:read-simulator failed, see the traceback below
Traceback (most recent call last):
File "{base_dir}/NEAT/neat/cli/cli.py", line 131, in main
cmd(args)
File "{base_dir}/NEAT/neat/cli/commands/read_simulator.py", line 47, in execute
read_simulator_runner(arguments.config, arguments.output)
File "{base_dir}/NEAT/neat/read_simulator/runner.py", line 313, in read_simulator_runner
read1_fastq_paired, read1_fastq_single, read2_fastq_paired, read2_fastq_single = generate_reads(
File "{base_dir}/NEAT/neat/read_simulator/utils/generate_reads.py", line 383, in generate_reads
read_2.finalize_read_and_write(
File "{base_dir}/NEAT/neat/read_simulator/utils/read.py", line 334, in finalize_read_and_write
self.errors, self.padding = err_model.get_sequencing_errors(
File "{base_dir}/NEAT/neat/models/error_models.py", line 193, in get_sequencing_errors
if rng.random() < quality_score_error_rate[quality_scores[i]]:
IndexError: index 90 is out of bounds for axis 0 with size 90
Sorry to hear. I will take a look as soon as I can.
Describe the bug I'm trying to generate a simulated dataset by using some different references and a configuration file like the one described in the examples of the README, but they both fail with different errors.
The first error is
IndexError: index out of range
; the second error (using the same configuration file but with a different reference) isKeyError: 'R_C'
To Reproduce
Error 1:
Using a hg19 reference (I don't know where it was downloaded from)
Using this configuration file
neat-config.yaml
: reference: references/hg19/hg19.fa read_len: 126 produce_bam: False produce_vcf: True paired_ended: True fragment_mean: 300 fragment_st_dev: 303: Run the simulation with:
neat read-simulator -c neat_config.yml -o simulated_stuff
Error 2:
Download the reference:
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
Using this configuration file
neat-config.yaml
: reference: references/GRCh38_full_analysis_set_plus_decoy_hla.fa read_len: 126 produce_bam: False produce_vcf: True paired_ended: True fragment_mean: 300 fragment_st_dev: 30Run the simulation with:
neat read-simulator -c neat_config.yml -o simulated_stuff
Got this error:
Expected behavior Have a vcf output file with simulated data
Desktop (please complete the following information):
Additional context I ran the neat read-simulator within a Docker container. I enter the container, activate the env neat by using conda and ran the simulation.
The Docker image was created by using this Dockerfile: