molmicdx / mtb-pipeline

MIT License
3 stars 0 forks source link

sample_normalized.vcf.gz: not in gzip format #5

Open jflucier opened 1 year ago

jflucier commented 1 year ago

Hi again,

with singularity problem solved, now another error linked to htslib using provided test data:

...
15:02:16.826 INFO  ProgressMeter -  NC_000962.3:4233097              0.0                 21880        3918806.0
15:02:16.826 INFO  ProgressMeter - Traversal complete. Processed 21880 total variants in 0.0 minutes.
15:02:16.826 INFO  LeftAlignAndTrimVariants - 0 variants left aligned
15:02:16.830 INFO  LeftAlignAndTrimVariants - Shutting down engine
[August 23, 2023 3:02:16 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.LeftAlignAndTrimVariants done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=2136473600
singularity exec -B /home/def-malouinf/program/mtb-pipeline library://seahym/mtb-pipeline/htslib:1.12 bgzip < output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf > output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz
INFO:    Downloading library image
singularity exec -B /home/def-malouinf/program/mtb-pipeline library://seahym/mtb-pipeline/htslib:1.12 tabix -f output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz; singularity exec -B /home/def-malouinf/program/mtb-pipeline library://seahym/mtb-pipeline/igv-reports:1.0.4 create_report output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz data/NC_000962.3.fa --flanking 1000 --info-columns "AF DP MQ QD" --tracks output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz output/H37Rv_5x10-3SNP50X_1/deduped_mq.bam --output output/H37Rv_5x10-3SNP50X_1/bcftools/igv.html
INFO:    Using cached image
[tabix] the compression of 'output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz' is not BGZF
INFO:    Downloading library image
108.9MiB / 108.9MiB [==================================================================================================] 100 % 10.1 MiB/s 0s
[E::hts_hopen] Failed to open file output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz
[E::hts_open_format] Failed to open file "output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz" : Exec format error
Traceback (most recent call last):
  File "/usr/local/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/report.py", line 346, in main
    create_report(args)
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/report.py", line 32, in create_report
    table = VariantTable(variants_file, args.info_columns, args.info_columns_prefixes, args.samples,
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/varianttable.py", line 12, in __init__
    vcf = pysam.VariantFile(vcfFile)
  File "pysam/libcbcf.pyx", line 4119, in pysam.libcbcf.VariantFile.__init__
  File "pysam/libcbcf.pyx", line 4344, in pysam.libcbcf.VariantFile.open
OSError: [Errno 8] could not open variant file `b'output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz'`: Exec format error
scons: *** [output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz.tbi] Error 1
scons: building terminated because of errors.

if I go and look at generated sample_normalized.vcf.gz:

(mtb-pipeline-env) |11:07:09|jflucier@ip34-rockylinux8:[mtb-pipeline]> zcat output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz | head

gzip: output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz: not in gzip format
(mtb-pipeline-env) |11:07:24|jflucier@ip34-rockylinux8:[mtb-pipeline]> 

thanks again for your help, JF

yeemey commented 1 year ago

Hi JF, Please check if output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf is blank or malformed in some way. If it is, or if it's difficult to tell, rerunning GATK LeftAlignTrimVariants and bgzip on their own to troubleshoot may be helpful.

singularity exec -B $PWD docker://broadinstitute/gatk:4.0.11.0 LeftAlignAndTrimVariants -R data/NC_000962.3.fa -V output/H37Rv_5x10-3SNP50X_1/bcftools/sample.vcf -O output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf

singularity exec -B $PWD library://seahym/mtb-pipeline/htslib:1.12 bgzip < output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf > output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf.gz
jflucier commented 1 year ago

Hi,

output of output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf looks fine:

(mtb-pipeline-env) |08:18:51|jflucier@ip34-rockylinux8:[mtb-pipeline]> ll output/H37Rv_5x10-3SNP5[11/50]
ftools/sample_normalized.vcf                                                                            
-rw-rw-r-- 1 jflucier def-malouinf 4354300 Aug 23 11:02 output/H37Rv_5x10-3SNP50X_1/bcftools/sample_norm
alized.vcf                                                                                              
(mtb-pipeline-env) |08:19:08|jflucier@ip34-rockylinux8:[mtb-pipeline]> head output/H37Rv_5x10-3SNP50X_1/
bcftools/sample_normalized.vcf                                                                          
##fileformat=VCFv4.2                                                                                    
##ALT=<ID=*,Description="Represents allele(s) other than observed.">                                    
##FILTER=<ID=PASS,Description="All filters passed">                                                     
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (high-quality bases)">                
##FORMAT=<ID=ADF,Number=R,Type=Integer,Description="Allelic depths on the forward strand (high-quality b
ases)">                                                                                                 
##FORMAT=<ID=ADR,Number=R,Type=Integer,Description="Allelic depths on the reverse strand (high-quality bases)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=SCR,Number=1,Type=Integer,Description="Per-sample number of soft-clipped reads (at high-quality bases)">
(mtb-pipeline-env) |08:19:19|jflucier@ip34-rockylinux8:[mtb-pipeline]> wc -l output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf
21921 output/H37Rv_5x10-3SNP50X_1/bcftools/sample_normalized.vcf
NC_000962.3     4409564 .       C       G       225.42  .       AC=1;AD=0,38;ADF=0,23;ADR=0,15;AN=1;DP=5
7;DP4=0,0,23,15;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.693143;VDB=0.948052      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:38:0:0,23:0,15:0,38:0
NC_000962.3     4409816 .       T       G       225.42  .       AC=1;AD=0,23;ADF=0,10;ADR=0,13;AN=1;DP=3
8;DP4=0,0,10,13;FS=0;MQ=60;MQ0F=0;SCR=1;SGB=-0.692717;VDB=0.491887      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:23:0:0,10:0,13:0,23:1
NC_000962.3     4409948 .       G       T       225.42  .       AC=1;AD=0,26;ADF=0,14;ADR=0,12;AN=1;DP=4
0;DP4=0,0,14,12;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.692976;VDB=0.903887      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:26:0:0,14:0,12:0,26:0
NC_000962.3     4410116 .       G       A       225.42  .       AC=1;AD=0,37;ADF=0,19;ADR=0,18;AN=1;DP=5
7;DP4=0,0,20,18;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.693143;VDB=0.645753      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:38:0:0,19:0,18:0,37:0
NC_000962.3     4410344 .       A       T       225.42  .       AC=1;AD=0,41;ADF=0,21;ADR=0,20;AN=1;DP=6
3;DP4=0,0,21,20;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.693145;VDB=0.95326       GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:41:0:0,21:0,20:0,41:0
NC_000962.3     4410691 .       G       A       225.42  .       AC=1;AD=0,36;ADF=0,14;ADR=0,22;AN=1;DP=5
5;DP4=0,0,14,22;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.693139;VDB=0.994748      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:36:0:0,14:0,22:0,36:0
NC_000962.3     4410900 .       C       G       225.42  .       AC=1;AD=0,27;ADF=0,11;ADR=0,16;AN=1;DP=3
7;DP4=0,0,11,16;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.693021;VDB=0.917651      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:27:0:0,11:0,16:0,27:0
NC_000962.3     4411151 .       G       A       225.42  .       AC=1;AD=0,28;ADF=0,15;ADR=0,13;AN=1;DP=4
4;DP4=0,0,15,13;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.693054;VDB=0.749836      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:28:0:0,15:0,13:0,28:0
NC_000962.3     4411298 .       A       C       225.42  .       AC=1;AD=0,30;ADF=0,15;ADR=0,15;AN=1;DP=4
5;DP4=0,0,15,15;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.693097;VDB=0.929325      GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:30:0:0,15:0,15:0,30:0
NC_000962.3     4411447 .       G       T       225.42  .       AC=1;AD=0,14;ADF=0,2;ADR=0,12;AN=1;DP=19
;DP4=0,0,2,12;FS=0;MQ=60;MQ0F=0;SCR=0;SGB=-0.686358;VDB=0.0120091       GT:PL:DP:SP:ADF:ADR:AD:SCR     1
:255,0:14:0:0,2:0,12:0,14:0

thanks