Closed dani-ture closed 5 months ago
I found a couple little bugs in one of the coverage functions, but I will double check to make sure I don't have a read_len variable where I meant to put coverage.
Thank you for the fast answer.
I believe this is fixed. Please reopen this ticket if the issue persists.
Describe the bug Reads seem to not distribute uniformly across the genome. I expected some bias but there are long regions with coverage = 0 and others with coverage = 150, when the
neat read-simulator
was run with the default value of coverage = 10.To Reproduce Steps to reproduce the behavior (early steps similar to what I described in issue #108):
Download the E. coli NCBI RefSeq assembly from the following link: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000005845.2/
Make a copy of the provided template config file (I called it test_config.yml) and set the parameters: ‘’’reference: GCF_000005845.2_ASM584v2_genomic.fna ploidy: 1 rng_seed: 6386514007882411’’’ The rest are left with the “.” as default.
Run neat on the command line:
neat --log-name test --log-detail HIGH --log-level DEBUG read-simulator -c test_config.yml -o test
I ran a variant calling pipeline, which involved steps of: checking read quality (with fastqc), mapping the reads in the fastq.gz file to the ref (with bwa-mem2), getting a bam file, sorting it and indexing it (with samtools) and calling the variants (with bcftools).
I opened in igv (Integrative Genomics Viewer) the .sort.bam file to see how reads mapped to the reference.
Expected behavior I expected to see more or less evenly distributed reads.
Additional comments
no_coverage_bias: true
, but it seemed to have no influence on the output.Desktop (please complete the following information):
Additional context
https://github.com/ncsa/NEAT/assets/131826966/8be5d905-737e-459e-953f-5b8c9137b78f