ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
47 stars 14 forks source link

neat read-simulator takes very long #80

Closed Npaffen closed 6 months ago

Npaffen commented 1 year ago

Describe the bug When I try to simulate reads with vcf-input the process seems to take forever. I did the following : neat read-simulator -c neat_config.yaml -o neat/

2023-07-13 17:00:37,523:INFO:neat.read_simulator.runner:Using configuration file neat_config.yaml
2023-07-13 17:00:37,523:INFO:neat.read_simulator.runner:Saving output files to data
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Run Configuration...
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Input fasta: chr22.fa
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Producing the following files:
  - data/neat_r1.fastq.gz
- data/neat_r2.fastq.gz
- data/neat_golden.bam

2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Single threading - 1 thread.
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Running in paired-ended mode.
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Generating fragment model based on mean=300.0, st dev=30.0
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Using a read length of 126
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Average coverage: 1
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Using default error model.
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:Ploidy value: 2
2023-07-13 17:00:37,524:INFO:neat.read_simulator.utils.options:RNG seed value for run: 5404759810307010
2023-07-13 17:00:37,524:INFO:neat.read_simulator.runner:Reading Models...
2023-07-13 17:00:37,525:INFO:neat.read_simulator.runner:Reading chr22.fa.
2023-07-13 17:00:40,037:INFO:neat.read_simulator.runner:Beginning simulation.
2023-07-13 17:00:40,584:INFO:neat.read_simulator.runner:Generating variants for chr22
2023-07-13 17:03:39,927:INFO:neat.read_simulator.utils.generate_variants:Finished generating random mutations in 2.98 minutes
2023-07-13 17:03:39,927:INFO:neat.read_simulator.utils.generate_variants:Added 51203 mutations to chr22
2023-07-13 17:03:39,927:INFO:neat.read_simulator.runner:Outputting temp vcf for chr22 for later use
2023-07-13 17:03:40,358:INFO:neat.read_simulator.utils.local_file_writer:Finished outputting temp vcf/fasta
2023-07-13 17:03:40,361:INFO:neat.read_simulator.utils.generate_reads:Sampling reads..

All the files to replicate the issue can be downloaded here

joshfactorial commented 1 year ago

All right, I will check into this issue.

a00101 commented 12 months ago

I got same problem. Infinite time consuming,, neat.read_simulator.utils.generate_reads:Sampling reads...

neat read-simulator -c config.yml -o test

reference: test.fa
read_len: 150
coverage: 30
produce_bam: False
produce_vcf: False
paired_ended: True
fragment_mean: 300
fragment_st_dev: 30
joshfactorial commented 12 months ago

It'll be slow for large files. A stopgap solution is to break files up and run jobs concurrently.

joshfactorial commented 6 months ago

Apologies, everybody. For now, NEAT is simply slow. We are working on some optimizations for future releases. Try breaking large files or large runs (high coverage, high mutation rates, high ploidy) into several runs and run them in parallel then combine the results. We will work on our end to implement multithreading or some similar solution. Thank you for your patience.