Open giobus75 opened 1 week ago
Currently working on this. Will post a fix/new version soon!
One thing I'm noticing is that it's stumbling on the newest genome assembly, because of the inclusion of characters other than A, C, G, T, N. We will have to update the code to generalize these alternate characters, but currently we're unclear the best way to handle those is. It might be worth trying to replace non ACTG with N and see if that resolves part of the problem. I think we still have a stray indexing error, though, that crops up sometimes.
Thank you for your fast response. I'll try to follow your workaround replacing ACTG with N.
Hi, I replaced not-ACTGN chars with N but it still returns an Index out of range
error.
I used this code to replace chars:
fn = "../references/GRCh38_mod_with_N.fa"
out_fn = "../references/GRCh38_mod_with_N_replaced.fa"
with open (fn) as fd:
buff = fd.readlines()
new_buff = []
for i, l in enumerate(buff):
if "chr" not in l and "HLA" not in l:
l_upper = l.upper()
l = l_upper.translate(str.maketrans({'a': 'A', 'g': 'g', 'c': 'C', 't': 'T', 'M': 'N', 'R': 'N', 'Y': 'N', 'W': 'N', 'B': 'N', 'S': 'N', 'K': 'N'}))
new_buff.append(l)
with open(out_fn, "w") as fd:
for i, l in enumerate(new_buff):
fd.write(l)
Then I ran the simulation with the modified reference file:
neat read-simulator -c neat_config.yml -o simulated_stuff
And I got:
2024-10-08 15:08:23,709:INFO:neat.read_simulator.utils.generate_variants:Added 144796 mutations to chr8
2024-10-08 15:08:23,709:INFO:neat.read_simulator.utils.generate_reads:Sampling reads...
2024-10-08 16:13:45,016:ERROR:neat:read-simulator failed, see the traceback below
Traceback (most recent call last):
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/cli.py", line 131, in main
cmd(args)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
read_simulator_runner(arguments.config, arguments.output)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 314, in read_simulator_runner
read1_fastq_paired, read1_fastq_single, read2_fastq_paired, read2_fastq_single = generate_reads(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 345, in generate_reads
read_1.finalize_read_and_write(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 334, in finalize_read_and_write
self.errors, self.padding = err_model.get_sequencing_errors(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/models/error_models.py", line 241, in get_sequencing_errors
snv_reference = reference_segment[index]
File "/opt/conda/envs/neat/lib/python3.10/site-packages/Bio/Seq.py", line 430, in __getitem__
return chr(self._data[index])
IndexError: index out of range
ERROR: read-simulator failed, showing the last error
Traceback (most recent call last):
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/cli.py", line 131, in main
cmd(args)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
read_simulator_runner(arguments.config, arguments.output)
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 314, in read_simulator_runner
read1_fastq_paired, read1_fastq_single, read2_fastq_paired, read2_fastq_single = generate_reads(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 345, in generate_reads
read_1.finalize_read_and_write(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 334, in finalize_read_and_write
self.errors, self.padding = err_model.get_sequencing_errors(
File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/models/error_models.py", line 241, in get_sequencing_errors
snv_reference = reference_segment[index]
File "/opt/conda/envs/neat/lib/python3.10/site-packages/Bio/Seq.py", line 430, in __getitem__
return chr(self._data[index])
IndexError: index out of range
All right. I will look into this!
Describe the bug I'm trying to generate a simulated dataset by using some different references and a configuration file like the one described in the examples of the README, but they both fail with different errors.
The first error is
IndexError: index out of range
; the second error (using the same configuration file but with a different reference) isKeyError: 'R_C'
To Reproduce
Error 1:
Using a hg19 reference (I don't know where it was downloaded from)
Using this configuration file
neat-config.yaml
: reference: references/hg19/hg19.fa read_len: 126 produce_bam: False produce_vcf: True paired_ended: True fragment_mean: 300 fragment_st_dev: 303: Run the simulation with:
neat read-simulator -c neat_config.yml -o simulated_stuff
Error 2:
Download the reference:
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
Using this configuration file
neat-config.yaml
: reference: references/GRCh38_full_analysis_set_plus_decoy_hla.fa read_len: 126 produce_bam: False produce_vcf: True paired_ended: True fragment_mean: 300 fragment_st_dev: 30Run the simulation with:
neat read-simulator -c neat_config.yml -o simulated_stuff
Got this error:
Expected behavior Have a vcf output file with simulated data
Desktop (please complete the following information):
Additional context I ran the neat read-simulator within a Docker container. I enter the container, activate the env neat by using conda and ran the simulation.
The Docker image was created by using this Dockerfile: