sbg / Mitty

Seven Bridges Genomics aligner/caller debugging and analysis tools
Apache License 2.0
13 stars 2 forks source link

GodAligner "IndexError: list index out of range" #15

Closed AdrienOliva closed 3 years ago

AdrienOliva commented 3 years ago

Hi, I have been trying to use the GodAligner to recover my true alignment of my generated reads. However when I run the following code, I do have an error "IndexError: list index out of range". After some digging I found out that the variable "ap2=[]" at some part of the loop (line 200 from god_aligner.py).. however I cannot found out why.

Here is the command I used to create the reads (and it worked perfectly)

!/usr/bin/env bash

set -ex FASTA=../data/human_g1k_v37.fa.gz
 SAMPLEVCF=../data/1kg.20.22.vcf.gz
 REGION_BED=ch20.bed
 FILTVCF=Ch20filt.vcf.gz
 SAMPLENAME=HG00119
 COVERAGE=2
 READ_GEN_SEED=7
 FASTQ_PREFIX=Ch20-PEreads
 READ_CORRUPT_SEED=7 READMODEL=1kg-pcr-free.pkl

mitty -v4 filter-variants \   ${SAMPLEVCF} \   ${SAMPLENAME} \   ${REGION_BED} \   - \   2> vcf-filter.log | bgzip -c > ${FILTVCF}

tabix -p vcf ${FILTVCF}

mitty -v4 generate-reads \   ${FASTA} \   ${FILTVCF} \   ${SAMPLENAME} \   ${REGION_BED} \   ${READMODEL} \   ${COVERAGE} \   ${READ_GEN_SEED} \   >(gzip > ${FASTQ_PREFIX}1.fq.gz) \   ${FASTQ_PREFIX}-lq.txt \   --fastq2 >(gzip > ${FASTQ_PREFIX}2.fq.gz) \   --threads 2

mitty -v4 corrupt-reads \   ${READMODEL} \   ${FASTQ_PREFIX}1.fq.gz >(gzip > ${FASTQ_PREFIX}-corrupt1.fq.gz) \   ${FASTQ_PREFIX}-lq.txt \   ${FASTQ_PREFIX}-corrupt-lq.txt \   ${READ_CORRUPT_SEED} \   --fastq2-in ${FASTQ_PREFIX}2.fq.gz \   --fastq2-out >(gzip > ${FASTQ_PREFIX}-corrupt2.fq.gz) \   --threads 2

and here is the command I used to run GodAligner:

!/usr/bin/env bash

set -ex

FASTA=../data/human_g1k_v37.fa.gz FASTQ_PREFIX=Ch20-PEreads GODBAM=Ch20-god.bam DO_NOT_INDEX=${1}

mitty -v4 god-aligner \ ${FASTA} \ ${FASTQ_PREFIX}-corrupt1.fq.gz \ ${FASTQ_PREFIX}-corrupt-lq.txt \ ${GODBAM} \ --fastq2 ${FASTQ_PREFIX}-corrupt2.fq.gz \ --threads 2

The full error is as follow:

Process Process-1: Traceback (most recent call last): File "/Users/anaconda3/envs/mymitty/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap self.run() File "/Users/anaconda3/envs/mymitty/lib/python3.5/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/Users/anaconda3/envs/mymitty/lib/python3.5/site-packages/mitty/benchmarking/god_aligner.py", line 140, in disciple write_perfect_reads(qname, rg_id, long_qname_table, ref_dict, read_data, cigar_v2, fp) File "/Users/anaconda3/envs/mymitty/lib/python3.5/site-packages/mitty/benchmarking/god_aligner.py", line 206, in write_perfect_reads p10, p11, p20, p21 = ap1[0][1], ap1[-1][1], ap2[0][1], ap2[-1][1] IndexError: list index out of range

I am working on macOS using Mitty version 2.28.3.

Thanks a lot, Adrien

EDIT: I tried simulating reads from another chromosome (Chr22) using the same pipelines (changed the seeds and coverage) and this time, everything went smoothly for the new dataset. However the previous one (Chr20) is still bugged.

AdrienOliva commented 3 years ago

I did achieve to simulate reads as I wanted by just re-running the all pipeline and now everything is working as expected.

For my previous dataset, all steps worked correctly (except the GodAligner) and I don't think it is the VCF or REF as I used them the new working dataset. I provided the reads and script I used here if you are interested to reproduce that behaviour. Thanks Adrien