Errors when running hap.py on Octopus generated VCF file

tahashmi commented 3 years ago

Describe the bug Getting some errors logged below when running hap.py on Octopus generated VCF file. I am using Chr20, HG003 Illumina WGS reads publicly available from the PrecisionFDA Truth v2 Challenge as used by DeepVariant WGS demo. Maybe I need to use some additional options when generating VCF!?

Version

$ singularity exec octopus_latest.sif octopus --version
octopus version 0.7.4
Target: x86_64 Linux 4.19.121-linuxkit
SIMD extension: AVX2
Compiler: GNU 10.2.0
Boost: 1_75

Command Command line to install octopus:

$ singularity pull  docker://dancooke/octopus

Command line to run octopus:

$ singularity exec octopus_latest.sif octopus --threads 24  --reference /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa  --reads /scratch-shared/tahmad/bio_data/NA24385/HG003/HG003_chr20.bam -o /scratch-shared/tahmad/bio_data/NA24385/HG003/octopus.vcf

Additional context Add any other context about the problem here, e.g.

hap.py command:

singularity exec --bind /usr/lib/locale/ docker://pkrusche/hap.py     /opt/hap.py/bin/hap.py         --threads 24          -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa         -f benchmark/HG003_GRCh38_1_22_v4.2_benchmark.bed         -o /scratch-shared/tahmad/bio_data/giab-comparison.v4.2.first_pass         --engine=vcfeval  -l chr20      benchmark/HG003_GRCh38_1_22_v4.2_benchmark.vcf.gz     /scratch-shared/tahmad/bio_data/NA24385/HG003/octopus.vcf

Reference: GRCh38
BAM: HG003 chr20

2021-06-01 07:40:23,961 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2021-06-01 07:40:23,966 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[W] overlapping records at chr6:29747433 for sample 0
[W] Variants that overlap on the reference allele: 4
[I] Total VCF records:         4000097
[I] Non-reference VCF records: 4000097
[I] Total VCF records:         6577
[I] Non-reference VCF records: 6577
2021-06-01 07:41:49,609 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:47269115-48919823 -o /scratch-local/tahmad/input.chr20:47269115-48919823FCcl7Y.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutRwsJme.log / /scratch-local/tahmad/stderrxQkR78.log
2021-06-01 07:41:49,616 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:47481052
2021-06-01 07:41:49,617 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:49,617 ERROR    ------------------------------------------------------------
2021-06-01 07:41:49,617 ERROR    Traceback (most recent call last):
2021-06-01 07:41:49,617 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:49,617 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:49,617 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:49,618 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:49,618 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:49,619 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:49,619 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:47269115-48919823 -o /scratch-local/tahmad/input.chr20:47269115-48919823FCcl7Y.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:49,619 ERROR    ------------------------------------------------------------
2021-06-01 07:41:49,640 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:48919824-49790022 -o /scratch-local/tahmad/input.chr20:48919824-4979002239Tcio.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutrsDpbX.log / /scratch-local/tahmad/stderrFqUVpC.log
2021-06-01 07:41:49,655 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:48979944
2021-06-01 07:41:49,657 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:49,657 ERROR    ------------------------------------------------------------
2021-06-01 07:41:49,657 ERROR    Traceback (most recent call last):
2021-06-01 07:41:49,657 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:49,657 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:49,658 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:49,658 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:49,658 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:49,658 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:49,658 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:48919824-49790022 -o /scratch-local/tahmad/input.chr20:48919824-4979002239Tcio.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:49,658 ERROR    ------------------------------------------------------------
2021-06-01 07:41:49,823 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:33971345-34479523 -o /scratch-local/tahmad/input.chr20:33971345-34479523bv2dvi.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout7ZF1D4.log / /scratch-local/tahmad/stderryJpOBJ.log
2021-06-01 07:41:49,837 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:34149156
2021-06-01 07:41:49,838 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:49,838 ERROR    ------------------------------------------------------------
2021-06-01 07:41:49,838 ERROR    Traceback (most recent call last):
2021-06-01 07:41:49,838 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:49,838 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:49,838 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:49,839 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:49,839 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:49,839 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:49,839 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:33971345-34479523 -o /scratch-local/tahmad/input.chr20:33971345-34479523bv2dvi.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:49,839 ERROR    ------------------------------------------------------------
2021-06-01 07:41:49,940 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:23113501-26140858 -o /scratch-local/tahmad/input.chr20:23113501-26140858bo6Cws.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutbCDiHJ.log / /scratch-local/tahmad/stderrK_hPce.log
2021-06-01 07:41:49,945 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:23293703
2021-06-01 07:41:49,946 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:49,946 ERROR    ------------------------------------------------------------
2021-06-01 07:41:49,946 ERROR    Traceback (most recent call last):
2021-06-01 07:41:49,946 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:49,946 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:49,946 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:49,947 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:49,947 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:49,947 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:49,947 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:23113501-26140858 -o /scratch-local/tahmad/input.chr20:23113501-26140858bo6Cws.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:49,947 ERROR    ------------------------------------------------------------
2021-06-01 07:41:50,123 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:34479524-35795312 -o /scratch-local/tahmad/input.chr20:34479524-35795312N8ZVnu.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutOoDVdv.log / /scratch-local/tahmad/stderr5p627g.log
2021-06-01 07:41:50,130 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:34565812
2021-06-01 07:41:50,131 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:50,131 ERROR    ------------------------------------------------------------
2021-06-01 07:41:50,131 ERROR    Traceback (most recent call last):
2021-06-01 07:41:50,131 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:50,131 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:50,131 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:50,132 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:50,132 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:50,132 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:50,132 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:34479524-35795312 -o /scratch-local/tahmad/input.chr20:34479524-35795312N8ZVnu.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:50,132 ERROR    ------------------------------------------------------------
2021-06-01 07:41:50,186 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:31924864-33971344 -o /scratch-local/tahmad/input.chr20:31924864-33971344V14kC4.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutZf7RT0.log / /scratch-local/tahmad/stderrEVHOdx.log
2021-06-01 07:41:50,193 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:31955700
2021-06-01 07:41:50,194 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:50,194 ERROR    ------------------------------------------------------------
2021-06-01 07:41:50,194 ERROR    Traceback (most recent call last):
2021-06-01 07:41:50,194 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:50,194 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:50,194 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:50,194 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:50,195 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:50,195 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:50,195 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:31924864-33971344 -o /scratch-local/tahmad/input.chr20:31924864-33971344V14kC4.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:50,195 ERROR    ------------------------------------------------------------
2021-06-01 07:41:50,783 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:9063122-11107431 -o /scratch-local/tahmad/input.chr20:9063122-11107431Od4pRo.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout_q_KjI.log / /scratch-local/tahmad/stderrZbk9Vt.log
2021-06-01 07:41:50,789 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:9618856
2021-06-01 07:41:50,790 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:50,790 ERROR    ------------------------------------------------------------
2021-06-01 07:41:50,790 ERROR    Traceback (most recent call last):
2021-06-01 07:41:50,790 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:50,790 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:50,791 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:50,791 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:50,791 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:50,791 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:50,792 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:9063122-11107431 -o /scratch-local/tahmad/input.chr20:9063122-11107431Od4pRo.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:50,792 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,178 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:35795313-37093630 -o /scratch-local/tahmad/input.chr20:35795313-37093630CvGKQA.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutPMdzdo.log / /scratch-local/tahmad/stderrOQugkJ.log
2021-06-01 07:41:51,184 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:35874758
2021-06-01 07:41:51,185 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:51,185 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,185 ERROR    Traceback (most recent call last):
2021-06-01 07:41:51,185 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:51,185 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:51,185 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:51,186 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:51,186 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:51,186 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:51,186 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:35795313-37093630 -o /scratch-local/tahmad/input.chr20:35795313-37093630CvGKQA.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:51,186 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,199 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:11107432-13616138 -o /scratch-local/tahmad/input.chr20:11107432-13616138KhslvD.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutBz0WfZ.log / /scratch-local/tahmad/stderrV4wXVb.log
2021-06-01 07:41:51,205 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:11388097
2021-06-01 07:41:51,206 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:51,206 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,206 ERROR    Traceback (most recent call last):
2021-06-01 07:41:51,206 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:51,206 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:51,206 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:51,207 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:51,207 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:51,207 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:51,207 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:11107432-13616138 -o /scratch-local/tahmad/input.chr20:11107432-13616138KhslvD.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:51,207 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,399 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:13616139-16079694 -o /scratch-local/tahmad/input.chr20:13616139-16079694Y0GfrK.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout93ZKjH.log / /scratch-local/tahmad/stderrjPejft.log
2021-06-01 07:41:51,405 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:13736720
2021-06-01 07:41:51,405 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:51,405 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,405 ERROR    Traceback (most recent call last):
2021-06-01 07:41:51,405 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:51,405 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:51,405 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:51,406 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:51,406 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:51,406 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:51,406 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:13616139-16079694 -o /scratch-local/tahmad/input.chr20:13616139-16079694Y0GfrK.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:51,406 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,410 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:41007712-43552842 -o /scratch-local/tahmad/input.chr20:41007712-43552842XeG1UN.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutLzXglC.log / /scratch-local/tahmad/stderr4Zuegq.log
2021-06-01 07:41:51,413 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:41018458
2021-06-01 07:41:51,413 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:51,413 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,413 ERROR    Traceback (most recent call last):
2021-06-01 07:41:51,413 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:51,414 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:51,414 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:51,414 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:51,414 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:51,414 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:51,414 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:41007712-43552842 -o /scratch-local/tahmad/input.chr20:41007712-43552842XeG1UN.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:51,414 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,466 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:21272656-23113500 -o /scratch-local/tahmad/input.chr20:21272656-23113500gB94cr.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutMTBcJx.log / /scratch-local/tahmad/stderr8vKnwX.log
2021-06-01 07:41:51,469 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:21353239
2021-06-01 07:41:51,471 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:51,471 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,471 ERROR    Traceback (most recent call last):
2021-06-01 07:41:51,471 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:51,471 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:51,471 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:51,472 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:51,472 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:51,472 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:51,472 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:21272656-23113500 -o /scratch-local/tahmad/input.chr20:21272656-23113500gB94cr.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:51,472 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,661 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:7160509-9063121 -o /scratch-local/tahmad/input.chr20:7160509-9063121Nk2wG0.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutnzqnie.log / /scratch-local/tahmad/stderrrvm146.log
2021-06-01 07:41:51,663 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:7257230
2021-06-01 07:41:51,663 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:51,663 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,664 ERROR    Traceback (most recent call last):
2021-06-01 07:41:51,664 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:51,664 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:51,664 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:51,664 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:51,664 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:51,665 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:51,665 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:7160509-9063121 -o /scratch-local/tahmad/input.chr20:7160509-9063121Nk2wG0.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:51,665 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,761 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:45816357-47269114 -o /scratch-local/tahmad/input.chr20:45816357-47269114SMO7ZB.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout8uss9T.log / /scratch-local/tahmad/stderr6FkYEB.log
2021-06-01 07:41:51,763 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:45911909
2021-06-01 07:41:51,764 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:51,764 ERROR    ------------------------------------------------------------
2021-06-01 07:41:51,764 ERROR    Traceback (most recent call last):
2021-06-01 07:41:51,764 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:51,764 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:51,764 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:51,764 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:51,764 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:51,765 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:51,765 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:45816357-47269114 -o /scratch-local/tahmad/input.chr20:45816357-47269114SMO7ZB.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:51,765 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,305 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:19323029-21272655 -o /scratch-local/tahmad/input.chr20:19323029-212726555fglsS.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout9D0Z9Y.log / /scratch-local/tahmad/stderrKwqkH0.log
2021-06-01 07:41:52,307 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:19333621
2021-06-01 07:41:52,307 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,307 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,308 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,308 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,308 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,308 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,308 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,308 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,309 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,309 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:19323029-21272655 -o /scratch-local/tahmad/input.chr20:19323029-212726555fglsS.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,309 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,511 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:26140859-31924863 -o /scratch-local/tahmad/input.chr20:26140859-31924863Or_QB8.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutQSNLU6.log / /scratch-local/tahmad/stderr9IamaC.log
2021-06-01 07:41:52,513 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:28612677
2021-06-01 07:41:52,514 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,514 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,514 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,514 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,514 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,514 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,514 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,514 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,515 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,515 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:26140859-31924863 -o /scratch-local/tahmad/input.chr20:26140859-31924863Or_QB8.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,515 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,651 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:58876918-62120907 -o /scratch-local/tahmad/input.chr20:58876918-62120907LSCFWJ.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutsKVyXW.log / /scratch-local/tahmad/stderrb4QEb5.log
2021-06-01 07:41:52,653 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:59052229
2021-06-01 07:41:52,653 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,653 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,653 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,653 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,653 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,653 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,653 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,653 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,654 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,654 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:58876918-62120907 -o /scratch-local/tahmad/input.chr20:58876918-62120907LSCFWJ.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,654 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,866 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:49790023-51390188 -o /scratch-local/tahmad/input.chr20:49790023-513901883ewx5y.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutzaXNtB.log / /scratch-local/tahmad/stderr3CEVB9.log
2021-06-01 07:41:52,868 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:49796136
2021-06-01 07:41:52,868 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,868 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,868 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,868 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,869 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,869 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,869 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,869 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,869 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,869 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:49790023-51390188 -o /scratch-local/tahmad/input.chr20:49790023-513901883ewx5y.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,869 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,879 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:43552843-45816356 -o /scratch-local/tahmad/input.chr20:43552843-45816356yuS3zD.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout2MLCHe.log / /scratch-local/tahmad/stderre6OMF7.log
2021-06-01 07:41:52,880 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:43749488
2021-06-01 07:41:52,881 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,881 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,881 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,881 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,881 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,881 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,881 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,881 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,882 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,882 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:43552843-45816356 -o /scratch-local/tahmad/input.chr20:43552843-45816356yuS3zD.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,882 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,891 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:38888860-41007711 -o /scratch-local/tahmad/input.chr20:38888860-41007711dGhyMw.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutJcIWbu.log / /scratch-local/tahmad/stderrwVDF3Q.log
2021-06-01 07:41:52,892 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:39116733
2021-06-01 07:41:52,893 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,893 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,893 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,893 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,893 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,893 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,893 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,893 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,894 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,894 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:38888860-41007711 -o /scratch-local/tahmad/input.chr20:38888860-41007711dGhyMw.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,894 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,937 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:53775717-55898481 -o /scratch-local/tahmad/input.chr20:53775717-55898481ZUzseh.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutJ9bJBH.log / /scratch-local/tahmad/stderrCCdRk9.log
2021-06-01 07:41:52,937 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:5802681-7160508 -o /scratch-local/tahmad/input.chr20:5802681-7160508vDmZXy.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout2LcvoN.log / /scratch-local/tahmad/stderr1oobk8.log
2021-06-01 07:41:52,939 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:54045209
2021-06-01 07:41:52,939 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,939 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,939 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:6149657
2021-06-01 07:41:52,939 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,939 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,939 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,939 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,939 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:52,939 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,939 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,939 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,939 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,939 ERROR    Traceback (most recent call last):
2021-06-01 07:41:52,940 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:53775717-55898481 -o /scratch-local/tahmad/input.chr20:53775717-55898481ZUzseh.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,940 ERROR    ------------------------------------------------------------
2021-06-01 07:41:52,940 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:52,940 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:52,940 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:52,940 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:52,940 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:52,940 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:52,940 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:5802681-7160508 -o /scratch-local/tahmad/input.chr20:5802681-7160508vDmZXy.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:52,941 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,003 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:55898482-58876917 -o /scratch-local/tahmad/input.chr20:55898482-588769171jpUvR.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutYcKzRz.log / /scratch-local/tahmad/stderrHwM8C7.log
2021-06-01 07:41:53,003 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:4358658-5802680 -o /scratch-local/tahmad/input.chr20:4358658-5802680zgAcxT.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutzehbpu.log / /scratch-local/tahmad/stderrpYZps3.log
2021-06-01 07:41:53,003 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:1-2925496 -o /scratch-local/tahmad/input.chr20:1-2925496dCol3Q.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutuuUHFo.log / /scratch-local/tahmad/stderrl7ZMO1.log
2021-06-01 07:41:53,005 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:56005641
2021-06-01 07:41:53,005 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:4726007
2021-06-01 07:41:53,005 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:139003
2021-06-01 07:41:53,005 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,005 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,005 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,005 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,005 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,006 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,006 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,006 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,006 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,006 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,006 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,006 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,006 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,006 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,006 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,006 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,006 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,006 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,006 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,006 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:55898482-58876917 -o /scratch-local/tahmad/input.chr20:55898482-588769171jpUvR.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,006 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,006 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,006 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,006 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,007 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:1-2925496 -o /scratch-local/tahmad/input.chr20:1-2925496dCol3Q.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,007 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,007 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,007 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,007 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,007 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,007 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,007 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:4358658-5802680 -o /scratch-local/tahmad/input.chr20:4358658-5802680zgAcxT.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,007 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,017 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:62120908-2147483647 -o /scratch-local/tahmad/input.chr20:62120908-2147483647FmjaX1.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutQp6ijP.log / /scratch-local/tahmad/stderrgANtvy.log
2021-06-01 07:41:53,018 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:62147602
2021-06-01 07:41:53,019 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,019 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,019 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,019 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,019 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,019 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,019 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,019 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,019 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,019 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:62120908-2147483647 -o /scratch-local/tahmad/input.chr20:62120908-2147483647FmjaX1.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,019 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,027 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:37093631-38888859 -o /scratch-local/tahmad/input.chr20:37093631-38888859t71BXA.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutlFfXqP.log / /scratch-local/tahmad/stderrhoT5Kt.log
2021-06-01 07:41:53,029 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:37097922
2021-06-01 07:41:53,029 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,029 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,030 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,030 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,030 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,030 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,030 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,030 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,030 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,030 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:37093631-38888859 -o /scratch-local/tahmad/input.chr20:37093631-38888859t71BXA.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,031 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,053 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:16079695-19323028 -o /scratch-local/tahmad/input.chr20:16079695-19323028rVyVge.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutcFKu1v.log / /scratch-local/tahmad/stderrk0f1PX.log
2021-06-01 07:41:53,054 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:16195347
2021-06-01 07:41:53,055 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,055 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,055 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,055 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,055 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,055 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,055 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,055 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,056 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,056 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:16079695-19323028 -o /scratch-local/tahmad/input.chr20:16079695-19323028rVyVge.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,056 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,086 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:2925497-4358657 -o /scratch-local/tahmad/input.chr20:2925497-4358657e_LkBF.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdout8BIxH6.log / /scratch-local/tahmad/stderrNs41gI.log
2021-06-01 07:41:53,087 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:3002415
2021-06-01 07:41:53,087 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,087 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,087 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,087 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,088 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,088 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,088 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,088 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,088 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,088 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:2925497-4358657 -o /scratch-local/tahmad/input.chr20:2925497-4358657e_LkBF.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,088 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,168 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:52889842-53775716 -o /scratch-local/tahmad/input.chr20:52889842-53775716UNBa2z.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutfK6wDo.log / /scratch-local/tahmad/stderrKVSoQP.log
2021-06-01 07:41:53,170 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:52939316
2021-06-01 07:41:53,170 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,170 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,170 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,170 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,170 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,170 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,170 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,170 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,170 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,170 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:52889842-53775716 -o /scratch-local/tahmad/input.chr20:52889842-53775716UNBa2z.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,171 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,269 ERROR    Preprocess command preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:51390189-52889841 -o /scratch-local/tahmad/input.chr20:51390189-528898417_WKsR.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa failed. Outputs are here /scratch-local/tahmad/stdoutVRIigZ.log / /scratch-local/tahmad/stderrbTFq7j.log
2021-06-01 07:41:53,271 ERROR    Filter 'q10,AFB' is not in the VCF header. This will break VCF writing, at chr20:51560413
2021-06-01 07:41:53,271 ERROR    Exception when running <function preprocessWrapper at 0x2b47e32c89b0>:
2021-06-01 07:41:53,271 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,271 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,271 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2021-06-01 07:41:53,271 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2021-06-01 07:41:53,271 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2021-06-01 07:41:53,271 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2021-06-01 07:41:53,271 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2021-06-01 07:41:53,271 ERROR        raise CalledProcessError(retcode, cmd)
2021-06-01 07:41:53,271 ERROR    CalledProcessError: Command 'preprocess /scratch-local/tahmad/tmpBgrLr7.vcf.gz:* -l chr20:51390189-52889841 -o /scratch-local/tahmad/input.chr20:51390189-528898417_WKsR.prep.vcf.gz -V 1 -L 1 -r /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa' returned non-zero exit status 1
2021-06-01 07:41:53,272 ERROR    ------------------------------------------------------------
2021-06-01 07:41:53,275 ERROR    One of the preprocess jobs failed
2021-06-01 07:41:53,275 ERROR    Traceback (most recent call last):
2021-06-01 07:41:53,275 ERROR      File "/opt/hap.py/bin/hap.py", line 508, in <module>
2021-06-01 07:41:53,275 ERROR        main()
2021-06-01 07:41:53,275 ERROR      File "/opt/hap.py/bin/hap.py", line 363, in main
2021-06-01 07:41:53,275 ERROR        "QUERY")
2021-06-01 07:41:53,275 ERROR      File "/opt/hap.py/bin/pre.py", line 203, in preprocess
2021-06-01 07:41:53,276 ERROR        haploid_x=gender == "male")
2021-06-01 07:41:53,276 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 214, in partialCredit
2021-06-01 07:41:53,276 ERROR        raise Exception("One of the preprocess jobs failed")
2021-06-01 07:41:53,276 ERROR    Exception: One of the preprocess jobs failed

dancooke commented 3 years ago

Just guessing, but I think this could be due to Octopus reporting multiple FORMAT/FT values but the spec states - incorrectly IMO - FT has 1 value. I should probably change Octopus' VCF header to reflect this spec violation.

Note that for germline calling, you probably want to be using random forest filtering. If you built the Singularity image from the Octopus Dockerfile then these are already installed into /opt/octopus/resources/forests, so your command becomes

$ singularity exec octopus_latest.sif octopus --threads 24  --reference /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa  --reads /scratch-shared/tahmad/bio_data/NA24385/HG003/HG003_chr20.bam --forest /opt/octopus/resources/forests/germline.v0.7.4.forest -o /scratch-shared/tahmad/bio_data/NA24385/HG003/octopus.vcf

Using this filtering method only ever uses one value for FORMAT/FT (PASS or FT), so it should avoid the problem with using hap.py.

In addition, to get best performance you should specify the error model. You should also write to compressed VCF and set the target calling region. In summary:

$ singularity exec octopus_latest.sif octopus --threads 24  --reference /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa  --reads /scratch-shared/tahmad/bio_data/NA24385/HG003/HG003_chr20.bam -T chr20 --sequence-error-model PCRF.NovaSeq --forest /opt/octopus/resources/forests/germline.v0.7.4.forest -o /scratch-shared/tahmad/bio_data/NA24385/HG003/octopus.vcf.gz

tahashmi commented 3 years ago

Hi @dancooke , Thanks for the quick reply. I used the last command you mentioned and it worked.

But still the variants detected by Octopus are very low as compared to DeepVariant and that's why accuracy against GIAB HG003 v4.2 truth benchmark set is low as well as shown:

2021-06-01 15:39:02,804 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2021-06-01 15:39:02,809 WARNING  No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[W] overlapping records at chr6:29747433 for sample 0
[W] Variants that overlap on the reference allele: 4
[I] Total VCF records:         4000097
[I] Non-reference VCF records: 4000097
[I] Total VCF records:         6592
[I] Non-reference VCF records: 6592
2021-06-01 15:40:32,702 WARNING  Creating template for vcfeval. You can speed this up by supplying a SDF template that corresponds to /scratch-shared/tahmad/bio_data/GRCh38/GRCh38.fa
/home/tahmad/.local/lib/python2.7/site-packages/pandas/core/computation/check.py:19: UserWarning: The installed version of numexpr 2.4.3 is not supported in pandas and will be not be used
The minimum supported version is 2.6.1

  ver=ver, min_ver=_MIN_NUMEXPR_VERSION), UserWarning)
Benchmarking Summary:
  Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
 INDEL    ALL        10634       429     10205         4651      3022       1198    440       0.040342          0.124819        0.257579         0.060977                     NaN                     NaN                   1.749861                   0.552720
 INDEL   PASS        10634       263     10371         2380      1504        611    183       0.024732          0.149802        0.256723         0.042455                     NaN                     NaN                   1.749861                   0.435101
   SNP    ALL        70209       980     69229         1926       733        214    717       0.013958          0.571846        0.111111         0.027251                2.297347                2.226131                   1.884533                   0.071786
   SNP   PASS        70209       954     69255         1736       602        181    592       0.013588          0.612862        0.104263         0.026587                2.297347                2.191176                   1.884533                   0.069624

dancooke commented 3 years ago

Something is definitely amiss here - the performance of Octopus and DeepVariant should be very similar on this dataset. I'll try to reproduce soon.

tahashmi commented 3 years ago

I have repeated the experiment (maybe I was using some other reference previously). I got comparable correct results. Thanks. DeepVariant:

 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
 INDEL    ALL        10634     10579        55        21045        24       9984     19       0.994828          0.997830        0.474412         0.996327                     NaN                     NaN                   1.749861                   2.296457
 INDEL   PASS        10634     10579        55        21045        24       9984     19       0.994828          0.997830        0.474412         0.996327                     NaN                     NaN                   1.749861                   2.296457
   SNP    ALL        70209     69947       262        85681        85      15619     14       0.996268          0.998787        0.182292         0.997526                2.297347                2.071024                   1.884533                   1.937783
   SNP   PASS        70209     69947       262        85681        85      15619     14       0.996268          0.998787        0.182292         0.997526                2.297347                2.071024                   1.884533                   1.937783

Octopus:

  Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
 INDEL    ALL        10634     10586        48        23110        89      11874     22       0.995486          0.992079        0.513804         0.993780                     NaN                     NaN                   1.749861                   2.081653
 INDEL   PASS        10634     10579        55        20827        18       9670      9       0.994828          0.998387        0.464301         0.996604                     NaN                     NaN                   1.749861                   1.879637
   SNP    ALL        70209     69909       300        99329       569      29170     34       0.995727          0.991890        0.293671         0.993805                2.297347                1.966237                   1.884533                   2.461922
   SNP   PASS        70209     69856       353        82612        87      12987     11       0.994972          0.998750        0.157205         0.996858                2.297347                2.147613                   1.884533                   1.920645

luntergroup / octopus

Errors when running hap.py on Octopus generated VCF file #187

hap.py command: