odelaneau / GLIMPSE

Low Coverage Calling of Genotypes
MIT License
141 stars 27 forks source link

Floating point exception (core dumped) when run GLIMPSE2_phase #220

Open dangcaptkd2 opened 4 months ago

dangcaptkd2 commented 4 months ago

I'm running GLIMPSE2_phase v2.0.0 which is up to date for 1 sample for chromosome 1. Here is my script:

GLIMPSE2_phase --input-gl /data/GL_file_merged/high_dep_100.chr1.vcf.gz --reference /data/reference_file/khv.highcoverage.chr1.biallelic.snp.maf0.001.sites.vcf.gz --map /maps/chr1.b38.gmap.gz  --input-region chr1:1-5010347 --output-region chr1:1-4010272 --output /data/imputed_file/high_dep_100.chr1.01.imputed.vcf --sparse-maf 0.01

It went through but has the following error message. Any idea of why that happened? Thank you!

[GLIMPSE2] Phase and impute low coverage sequencing data

Files:

GLIMPSE_phase parameters:

Model parameters:

Selection parameters:

Other parameters

Initialisation:

Parsing specified genomic regions

srubinacci commented 3 months ago

Hi, Hard to say, but it seems like there's some numerical program when reading the likelihoods. What software was used to compute them?

sheridar commented 2 months ago

Hi, I have also been getting a floating point exception during initialization and am trying to troubleshoot, my log file is shown below. I am trying to phase SNPs for chrX (following the GLIMPSE1 tutorial here). I calculated genotype likelihoods using bcftools, my commands and the top of my vcf file are included at the bottom. Any suggestions would be very helpful!

[GLIMPSE2] Phase and impute low coverage sequencing data
  * Authors              : Simone RUBINACCI & Olivier DELANEAU, University of Lausanne
  * Contact              : simone.rubinacci@unil.ch & olivier.delaneau@unil.ch
  * Version              : GLIMPSE2_phase v2.0.0 / commit = 88915f7 / release = 2023-04-24
  * Citation             : BiorXiv, (2022). DOI: https://doi.org/10.1101/2022.11.28.518213
  *                      : Nature Genetics 53, 120–126 (2021). DOI: https://doi.org/10.1038/s41588-020-00756-0
  * Run date             : 09/09/2024 - 15:29:40

Files:
  * Input VCF/BCF likel. : [results/pacbio/glimpse/bc2066_chrX.vcf.gz]
  * Reference VCF        : [ref/chrX_1000GP_filt_sites.vcf.gz]
  * Genetic Map          : [ref/chrX.b38.gmap]
  * Output file          : [results/pacbio/glimpse/impute/bc2066_chrX_imputed_002.bcf]
  * Output format        : [BCF format | ZLIB compression]

GLIMPSE_phase parameters:
  * Input                : [FORMAT/PL field for genotype likelihoods]
  * Imputation model     : [Reference panel imputation]
  * Input region         : [chrX:6603070-11334729]
  * Output region        : [chrX:7544080-10160253]
  * Sparse MAF           : [0.001]
  * Recombination rates  : [Given by genetic map]
  * Ploidy               : [Only diploid samples in region]
  * Keep monom. ref sites: [NO]

Model parameters:
  * #Burnin iterations   : [5]
  * #Main iterations     : [15]
  * Ne [eff. pop. size]  : [100000]
  * Phase error rate     : [0.0001]
  * Imputation error rate: [1e-12]
  * Min value for hap GLs: [1e-10]

Selection parameters:
  * K init               : [1000]
  * K pbwt               : [2000]
  * PBWT depth           : [12]
  * PBWT modulo (cM)     : [0.1]
  * State list           : [No list provided]
  * Use PLs]

Other parameters
  * Seed                 : [15052011]
  * #Threads             : [1]

Initialisation:

Parsing specified genomic regions
  * Input region  [chrX:6603070-11334729]
  * Output region  [chrX:7544080-10160253]
  * VCF/BCF scanning ...^M  * VCF/BCF scanning [Reg=chrX:6603070-11334729 / L=69936 (Li= 69936 (100%) - Lm= 0 (0%))]
  * VCF/BCF scanning [L=69936 (Lrare= 34271 (49.0%) - Lcommon= 35665 (51.0%))]

Here are my bcftools commands:

bcftools mpileup -I -E -Ou \
        -f "$fa_filt" \
        -a 'FORMAT/DP' \
        -T "$ref_vcf" \
        -r chrX \
        --threads "$threads" \
        "$bam_filt" \
    | bcftools call -A -i -m -Ou \
        --ploidy GRCh38 \
        -C alleles \
        -T "$ref_tsv" \
    | bcftools view -Oz -c 0 \
        -o "$vcf"

Here are snippets from my vcf file:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.9+htslib-1.9
##bcftoolsCommand=mpileup -I -E -Ou -f results/pacbio/glimpse/chrX.fa -a FORMAT/DP -T ref/chrX_1000GP_filt_sites.vcf.gz -r chrX --threads 64 results/pacbio/glimpse/bc2066_chrX_filtered.bam
##reference=file://results/pacbio/glimpse/chrX.fa
##contig=<ID=chrX,length=156040895>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)">
##INFO=<ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.9+htslib-1.9
##bcftools_callCommand=call -A -i -m -Ou --ploidy GRCh38 -C alleles -T ref/chrX_1000GP_filt_sites.tsv.gz; Date=Mon Aug 26 18:40:55 2024
##bcftools_viewVersion=1.9+htslib-1.9
##bcftools_viewCommand=view -Oz -c 0 -o results/pacbio/glimpse/bc2066_chrX.vcf.gz; Date=Mon Aug 26 18:41:07 2024
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Homo sapiens_FXPM-Female-1030-09-MM-CBL
chrX    13189   .       G       A       29.5864 .       DP=3;MQ0F=0;AC=0;AN=0;DP4=0,0,0,0;MQ=.  GT:PL:DP        ./.:0,0,0:0
chrX    13205   .       T       C       29.5864 .       DP=3;MQ0F=0;AC=0;AN=0;DP4=0,0,0,0;MQ=.  GT:PL:DP        ./.:0,0,0:0
chrX    13221   .       T       C       29.5864 .       DP=3;MQ0F=0;AC=0;AN=0;DP4=0,0,0,0;MQ=.  GT:PL:DP        ./.:0,0,0:0
chrX    13223   .       G       C       29.5864 .       DP=3;MQ0F=0;AC=0;AN=0;DP4=0,0,0,0;MQ=.  GT:PL:DP        ./.:0,0,0:0
chrX    13230   .       G       T       29.5864 .       DP=3;MQ0F=0;AC=0;AN=0;DP4=0,0,0,0;MQ=.  GT:PL:DP        ./.:0,0,0:0

...

chrX    95215   .       T       A       67      .       DP=1;MQ0F=0;AC=0;AN=2;DP4=0,1,0,0;MQ=60 GT:PL:DP        0/0:0,3,40:1
chrX    95232   .       G       C       67      .       DP=1;MQ0F=0;AC=0;AN=2;DP4=0,1,0,0;MQ=60 GT:PL:DP        0/0:0,3,40:1
chrX    96098   .       C       G       100     .       DP=4;MQ0F=0;AC=0;AN=2;DP4=0,2,0,0;MQ=60 GT:PL:DP        0/0:0,6,73:2
chrX    96122   .       G       T       100     .       DP=4;MQ0F=0;AC=0;AN=2;DP4=0,2,0,0;MQ=60 GT:PL:DP        0/0:0,6,73:2
chrX    96389   .       G       T       100     .       DP=4;MQ0F=0;AC=0;AN=2;DP4=0,2,0,0;MQ=60 GT:PL:DP        0/0:0,6,73:2