pstawinski / pygenebe

PyGeneBe: A Python client seamlessly integrating with the GeneBe platform, offering efficient annotation of genetic variants through its API, while supporting pandas, VCF file formats, and HGVS parsing
https://genebe.net
6 stars 0 forks source link

genebe commandline tool fails with input vcf #1

Open adeffur opened 3 months ago

adeffur commented 3 months ago

I have an issue with successfully running the genebe commandline tool.

input command: genebe annotate --input 24072-01-01_split.vcf --output 24072-01-01_split_genebe.vcf --progress

error message:

Traceback (most recent call last): File "/Users/armindeffur/my-envs/genebe/bin/genebe", line 8, in sys.exit(main()) File "/Users/armindeffur/my-envs/genebe/lib/python3.9/site-packages/genebe/entrypoint.py", line 147, in main annotate_vcf( File "/Users/armindeffur/my-envs/genebe/lib/python3.9/site-packages/genebe/vcf_simple_annotator.py", line 122, in annotate_vcf variants_batch = [ File "/Users/armindeffur/my-envs/genebe/lib/python3.9/site-packages/genebe/vcf_simple_annotator.py", line 123, in f"{variant.CHROM}-{variant.POS}-{variant.REF}-{variant.ALT[0]}" IndexError: list index out of range

I suspect that the VCF is the issue, as it seems that genebe can't extract the correct chrom-pos-ref-alt information.

VCF file first few lines:

fileformat=VCFv4.3

FILTER=

fileDate=05/27/24

reference=hg38_2024

source=SEQUENCE Pilot_5.4.1

InputFileList=../Import/20240521_Twist-PCDv2_hg38-2_illumina/24072-01-01_S2_L001_R1_001.fastq.gz;../Import/20240521_Twist-PCDv2_hg38-2_illumina/24072-01-01_S2_L001_R2_001.fastq.gz

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=<ID=ClinVar:Clinical Significance,Number=.,Type=String,Description="Mutation Info from Public DBs">

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FILTER=

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

bcftools_normVersion=1.20+htslib-1.20

bcftools_normCommand=norm -m- 24072-01-01.vcf.gz; Date=Fri Jun 21 12:41:04 2024

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1

1 1922176 rs3039777 T TCTGA . PASS GI=CFAP74;TI=NM001304360.2;Illumina-50x;WEIGHTING=distinct;dbSNP:MAF=0.975000;COVFR=1044,1037;CHGVS=c.*110*111insTCAG GT:DP:AF:AD:ADF:ADR 1/1:2226:0.93:145,2081:79,1044:66,1037 1 1930142 rs141833643 C A . PASS GI=CFAP74;TI=NM_001304360.2;Illumina-50x;WEIGHTING=distinct;gnomAD:AC=1037;gnomAD:AF=0.007712;gnomAD:AN=134462;dbSNP:MAF=0.004553;COVFR=950,979;CHGVS=c.3206G>T;PHGVS=p.(Gly1069Val) GT:DP:AF:AD:ADF:ADR 0/1:3885:0.5:1956,1929:969,950:987,979 1 1968747 rs35269416 T C . PASS GI=CFAP74;TI=NM_001304360.2;Illumina-50x;WEIGHTING=distinct;gnomAD:AC=46962;gnomAD:AF=0.188203;gnomAD:AN=249528;dbSNP:MAF=0.111100;COVFR=1890,1655;CHGVS=c.1133A>G;PHGVS=p.(Lys378Arg) GT:DP:AF:AD:ADF:ADR 1/1:3556:1:11,3545:6,1890:5,1655 1 1968793 rs16824588 T C . PASS GI=CFAP74;TI=NM_001304360.2;Illumina-50x;WEIGHTING=distinct;gnomAD:AC=105280;gnomAD:AF=0.422052;gnomAD:AN=249448;dbSNP:MAF=0.228900;COVFR=1985,1183;CHGVS=c.1087A>G;PHGVS=p.(Ile363Val) GT:DP:AF:AD:ADF:ADR 1/1:3170:1:2,3168:1,1985:1,1183 1 1987049 rs4350140 A G . PASS GI=CFAP74;TI=NM_001304360.2;Illumina-50x;WEIGHTING=distinct;gnomAD:AC=130882;gnomAD:AF=0.552189;gnomAD:AN=237024;dbSNP:MAF=0.380100;COVFR=584,430;CHGVS=c.297-14T>C GT:DP:AF:AD:ADF:ADR 0/1:1995:0.51:981,1014:572,584:409,430 1 3682336 rs2273953 G A . PASS GI=TP73;TI=NM_005427.4;Illumina-50x;WEIGHTING=distinct;gnomAD:AC=27759;gnomAD:AF=0.203238;gnomAD:AN=136584;dbSNP:MAF=0.075000;COVFR=647,524;CHGVS=c.-30G>A GT:DP:AF:AD:ADF:ADR 0/1:2803:0.42:1632,1171:876,647:756,524 1 3682346 rs1801173 C T . PASS GI=TP73;TI=NM_005427.4;Illumina-50x;WEIGHTING=distinct;gnomAD:AC=30400;gnomAD:AF=0.201790;gnomAD:AN=150652;dbSNP:MAF=0.075000;COVFR=657,586;CHGVS=c.-20C>T GT:DP:AF:AD:ADF:ADR 0/1:2943:0.42:1700,1243:877,657:823,586 1 3690956 rs3765730 G A . PASS GI=TP73;TI=NM_001126242.3;Illumina-50x;WEIGHTING=distinct;gnomAD:AC=59098;gnomAD:AF=0.312130;gnomAD:AN=189338;dbSNP:MAF=0.226700;COVFR=276,497;CHGVS=c.39+12G>A GT:DP:AF:AD:ADF:ADR 0/1:1596:0.48:823,773:318,276:505,497

pstawinski commented 3 months ago

Hi, I am not able to reproduce the error.

I've tried with the input you've provided: input.vcf.gz

using the docker:

docker run -v ./input.vcf.gz:/tmp/input.vcf.gz -it --rm genebe/pygenebe:0.0.18 genebe annotate --input /tmp/input.vcf.gz --output /dev/stdout

Can you please try if the vcf you are using does not contain empty lines in the end?