ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
47 stars 14 forks source link

Excessive coverage bias #113

Closed dani-ture closed 3 months ago

dani-ture commented 3 months ago

Describe the bug Reads seem to not distribute uniformly across the genome. I expected some bias but there are long regions with coverage = 0 and others with coverage = 150, when the neat read-simulator was run with the default value of coverage = 10.

To Reproduce Steps to reproduce the behavior (early steps similar to what I described in issue #108):

  1. Download the E. coli NCBI RefSeq assembly from the following link: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000005845.2/

  2. Make a copy of the provided template config file (I called it test_config.yml) and set the parameters: ‘’’reference: GCF_000005845.2_ASM584v2_genomic.fna ploidy: 1 rng_seed: 6386514007882411’’’ The rest are left with the “.” as default.

  3. Run neat on the command line:neat --log-name test --log-detail HIGH --log-level DEBUG read-simulator -c test_config.yml -o test

  4. I ran a variant calling pipeline, which involved steps of: checking read quality (with fastqc), mapping the reads in the fastq.gz file to the ref (with bwa-mem2), getting a bam file, sorting it and indexing it (with samtools) and calling the variants (with bcftools).

  5. I opened in igv (Integrative Genomics Viewer) the .sort.bam file to see how reads mapped to the reference.

Expected behavior I expected to see more or less evenly distributed reads.

Additional comments

Desktop (please complete the following information):

Additional context

https://github.com/ncsa/NEAT/assets/131826966/8be5d905-737e-459e-953f-5b8c9137b78f

joshfactorial commented 3 months ago

I found a couple little bugs in one of the coverage functions, but I will double check to make sure I don't have a read_len variable where I meant to put coverage.

dani-ture commented 3 months ago

Thank you for the fast answer.

joshfactorial commented 3 months ago

I believe this is fixed. Please reopen this ticket if the issue persists.