ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
37 stars 12 forks source link

Unable to run NEAT examples in the README #89

Closed NuriaQueralt closed 3 months ago

NuriaQueralt commented 8 months ago

Describe the bug Running the 'whole genome simulation' in the readme herehere , this error arisen: raise ValueError("Bad mode %r" % mode) ValueError: Bad mode 'xt'

To Reproduce Steps to reproduce the behavior:

  1. Download the hg19.fa: ```wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

  2. Create a neat_config.yml file with the following content reference: hg19.fa read_len: 126 produce_bam: True produce_vcf: True paired_ended: True fragment_mean: 300 fragment_st_dev: 30

  3. Execute neat on the command line: neat read-simulator \ -c neat_config.yml \ -o /home/your path/simulated_reads

  4. See error: raise ValueError("Bad mode %r" % mode) ValueError: Bad mode 'xt'

Expected behavior To get the synthetic data as three different files: fastq, bam and vcf as it is stated in the README

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

joshfactorial commented 8 months ago

Apologies, this is a bug we're currently working through.

NuriaQueralt commented 8 months ago

Thank you for the fast reply! Looking forward to reruning it once this is fixed.

alabarga commented 8 months ago

can be 'skipped' temporarily adding

overwrite_output: True

to the config.yml

also there is a typo in the README for the Targeted region simulation

[contents of neat_config.yml]
reference: hg19.fa
read_len: 126
produce_bam: True
produce_vcf: True
paired_ended: True
fragment_mean: 300
fragment_st_dev: 30
targed_bed: hg19_exome.bed

should be

target_bed: hg19_exome.bed

however it looks like NEAT will generate data for all the reference, not only for the bed file, am I correct?

alabarga commented 8 months ago

if I set the reference to a FASTA file for the region in the bed file, it will run but I will get another error

  self.errors = err_model.get_sequencing_errors(self.length, self.reference_segment,
  File ".venv/lib/python3.10/site-packages/neat/models/models.py", line 413, in get_sequencing_errors
    if self.rng.random() < self.quality_score_error_rate[quality_scores[i]]:
KeyError: <generator object bin_scores at 0x7f7941f53530>
joshfactorial commented 8 months ago

however it looks like NEAT will generate data for all the reference, not only for the bed file, am I correct?

By default the variants will be concentrated in the bed file areas, but there will still be some in the background (as well as sequencing errors). You can use the parameter off_target_scalar to adjust this. If you want no variants outside the bed, then you can set this to 0.0.

if I set the reference to a FASTA file for the region in the bed file, it will run but I will get another error

Yeah, same, that's why this bug fix is taking me a minute.

joshfactorial commented 8 months ago

In the mean time you can try version 3.2. Apologies for the broken release...

alabarga commented 8 months ago

ok, thanks, just for your info, I manage to skip the pervious error adding

avg_seq_error: 0

to the config.yml but then I get a different error,

2023-10-31 18:53:43,982:ERROR:neat:read-simulator failed, see the traceback below
Traceback (most recent call last):
  File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/cli/cli.py", line 133, in main
    cmd(args)
  File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
    read_simulator_runner(arguments.config, arguments.output)
  File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 353, in read_simulator_runner
    generate_reads(local_reference,
  File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 589, in generate_reads
    read1.finalize_read_and_write(error_model_1, fq1, options.produce_fastq)
  File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 278, in finalize_read_and_write
    self.quality_array = err_model.get_quality_scores(len(self.reference_segment))
  File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/models/models.py", line 498, in get_quality_scores
    self.rng.normal(self.quality_score_probabilities[i][0],
IndexError: index 151 is out of bounds for axis 0 with size 151

input_read_length length is 162 but quality_score_probabilities length is 151

joshfactorial commented 8 months ago

Yeah, it's related to the previous error. I have a bug in how NEAT is calculating quality scores for deletions. I will post a fix as soon as I can.

-Josh


From: Alberto Labarga @.> Sent: Wednesday, November 1, 2023 2:08 AM To: ncsa/NEAT @.> Cc: Allen, Josh @.>; Comment @.> Subject: Re: [ncsa/NEAT] Unable to run NEAT examples in the README (Issue #89)

ok, thanks, just for your info, I manage to skip the pervious error adding

avg_seq_error: 0

to the config.yml but then I get a different error,

2023-10-31 18:53:43,982:ERROR:neat:read-simulator failed, see the traceback below Traceback (most recent call last): File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/cli/cli.py", line 133, in main cmd(args) File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute read_simulator_runner(arguments.config, arguments.output) File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 353, in read_simulator_runner generate_reads(local_reference, File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 589, in generate_reads read1.finalize_read_and_write(error_model_1, fq1, options.produce_fastq) File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 278, in finalize_read_and_write self.quality_array = err_model.get_quality_scores(len(self.reference_segment)) File "/home/alabarga/BSC/code/synthetic-genomes/.venv/lib/python3.10/site-packages/neat/models/models.py", line 498, in get_quality_scores self.rng.normal(self.quality_score_probabilities[i][0], IndexError: index 151 is out of bounds for axis 0 with size 151

input_read_length length is 162 but quality_score_probabilities length is 151

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/ncsa/NEAT/issues/89*issuecomment-1788512086__;Iw!!DZ3fjg!_NZriir0MUnOupEmzTpjWhYKjSEUwOd-CWQIvgjgn5Z936g8vXY2aUv90j6xmlsqVpq21JdRjs2G8nTBecO-5naaaR760Q$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGMI7237NGIWB7KA2TML3GLYCHYO3AVCNFSM6AAAAAA6XRJ3TKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBYGUYTEMBYGY__;!!DZ3fjg!_NZriir0MUnOupEmzTpjWhYKjSEUwOd-CWQIvgjgn5Z936g8vXY2aUv90j6xmlsqVpq21JdRjs2G8nTBecO-5nbA8EjCog$. You are receiving this because you commented.Message ID: @.***>

joshfactorial commented 7 months ago

I pushed a fix to the Develop branch. If you have time, can you test your code on that branch?

joshfactorial commented 3 months ago

This should now be fixed on the main branch an in the current release. Please check out the newest version and open a new ticket if you have any further issues.