odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
61 stars 9 forks source link

Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed, when phasing rare alleles #20

Open TanguyLallemand opened 1 year ago

TanguyLallemand commented 1 year ago

Hello, @odelaneau ,

Thanks for the great work with shapeit5!!

I'm looking into implementing it in a nextflow pipeline (from modules made by @LouisLeNezet, thanks also to him!). I have a problem when phasing rare variants with shapeit5 either 5.0 or 5.1. Indeed the HMM computations are not done. Do you have any idea why this might be?

`` [SHAPEIT5] Phase_rare (phase rare variants onto a haplotype scaffold)

Files:

Parameters:

Parsing specified genomic regions

Reading genotype data

Initializing sparse genotypes

Setting up genetic map

PBWT pass

HMM computations

odelaneau commented 1 year ago

Hi,

Can you also send the command line you run, please? For both phase_common and phase_rare?

Best,

Olivier Delaneau

Le mar. 4 avr. 2023 à 15:11, Tanguy Lallemand @.***> a écrit :

Hello, @odelaneau https://github.com/odelaneau ,

Thanks for the great work with shapeit5!!

I'm looking into implementing it in a nextflow pipeline (from modules made by @LouisLeNezet https://github.com/LouisLeNezet, thanks also to him!). I have a problem when phasing rare variants with shapeit5 either 5.0 or 5.1. Indeed the HMM computations are not done. Do you have any idea why this might be?

`` [SHAPEIT5] Phase_rare (phase rare variants onto a haplotype scaffold)

Files:

  • Unphased data : [split.chr18.vcf.gz]
  • Scaffold data : [chr18.ref.common.phased.vcf.gz]
  • Output data : [chr18_11000001_12000000.vcf.gz]

Parameters:

  • Seed : 15052011
  • Threads : 8 threads
  • PBWT : [depth = 2,2 / modulo = 0.1 / mac = 2 / mdr = 0.3]
  • HMM : [Ne = 15000 / Constant recombination rate of 1cM per Mb]

Parsing specified genomic regions

  • Scaffold region [chr18:10500001-12500000]
  • Input region [chr18:11000001-12000000]

Reading genotype data

  • BCF scanning ... [W::hts_idx_load3] The index file is older than the data file: split.chr18.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: chr18.ref.common.phased.vcf.gz.tbi
  • BCF scanning (35.15s)
    • Variant sites: [#scaffold=37997 | #rare=13362 | #all=51359]
    • 21696 rare variants in buffer regions excluded
    • 13362 variants found in both files [priority to scaffold]
    • 5265 multi-allelic variants excluded
  • HAP allocation [#scaffold=13362 / #samples=1842] (0.00s)
  • Genotype set allocation [#rare=37997 / #samples=1842] (0.00s)
  • BCF parsing ...
  • BCF parsing [100%]
  • Plain VCF/BCF parsing (36.04s)
    • Scaffold : [0/0=23191904 0/1=821979 1/1=598921]
    • Rare : [0/0=65436271 0/1=1925356 1/1=1326764 ./.=1302083]

Initializing sparse genotypes

  • 0 missing genotypes imputed at monomorphic sites
  • Genotype set transpose V2I (0.05s)

Setting up genetic map

  • Region length [1499962 bp / 1.5 cM (assuming 1cM per Mb)]
  • HMM parameters [Ne=15000 / Error=0.0001]
  • V2H transpose (0.03s)

PBWT pass

-

PBWT initialization [#eval=10283 / #select=6] (0.00s)

IBD2 constraints [100%]

IBD2 constraints [#inds=0 / #pairs=0] (0.42s)

PBWT forward selection [100%]

PBWT forward selection (2.65s)

PBWT backward selection [100%]

PBWT backward selection (2.87s)

  • states=229.97+/-88.68

    • collisions = 17789 / #pushes = 70642 / rate = 79.88%

HMM computations

  • Processing [0%] phase_rare_static: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32&, aligned_vector32&, std::vector&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed. ``

— Reply to this email directly, view it on GitHub https://github.com/odelaneau/shapeit5/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4XTIO4X66UFKGXP6MVTVLW7QMY5ANCNFSM6AAAAAAWSWIMTE . You are receiving this because you were mentioned.Message ID: @.***>

TanguyLallemand commented 1 year ago

Hi,

Yes, sure here they are.

phase_common_static \ --input split.chr18.full_pop.vcf.gz \ --region chr18:1-4000000 \ --thread 8 \ --filter-maf 0.001 \ --output-format bcf \ --output chr18_1_1000000.bcf

Between this two line i have ligated all file produced with phase common, producing chr18.ref.common.phased.vcf.gz.

phase_rare_static \ --input split.chr18.full_pop.vcf.gz \ --scaffold chr18.ref.common.phased.vcf.gz \ --input-region chr18:1-1000000 \ --scaffold-region chr18:1-1500000 \ --thread 8 \ --output chr18_1_1000000.vcf.gz

I am running those files in a docker container based on your released one.

Regards,

T

odelaneau commented 1 year ago

Sorry, I come back to this quite late. Is this problem still happening even with last version.

Two comments anyway:

  1. Try increasing your chunk sizes for phase_Rare.
  2. Check if there is nothing wrong with your indexes as you have this warning showing up:

_[W::hts_idx_load3] The index file is older than the data file: split.chr18.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: chr18.ref.common.phased.vcf.gz.tbi__

freeseek commented 1 year ago

I am having the same problem with phase_rare version 5.1.1 / commit = 990ed0d / release = 2023-05-08. These are the commands I have used:

phase_common \
  --thread 4 \
  --input "143_HESC_Genome.as.12.qc.bcf" \
  --reference "ALL.chr4.phase3_integrated.20130502.genotypes.bcf" \
  --map genetic_map.txt \
  --region 4:135951881-191154276 \
  --filter-maf 0.001 \
  --output "143_HESC_Genome.as.12.qc.scaffold.bcf"
phase_rare \
  --thread 4 \
  --input "143_HESC_Genome.as.12.qc.bcf" \
  --scaffold "143_HESC_Genome.as.12.qc.scaffold.bcf" \
  --map genetic_map.txt \
  --input-region 4:138482819-191154276 \
  --scaffold-region 4:135951881-191154276 \
  --output "143_HESC_Genome.as.12.qc.pgt.bcf"

After phase_rare outputs HMM computations I get the error:

phase_rare: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32<float>&, aligned_vector32<float>&, std::vector<unsigned int>&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed.

I can share via email this test case if you want to reproduce the issue.

rebeccaito commented 1 year ago

Thanks for SHAPEIT. It is a great tool. I'm having the same issue described here. Same version as above: v5.1.1 - 990ed0d. I have gone back and increased the chunk size so there are >340k biallelic SNPs, removed variants with excess (>10%) missingness and excess heterozygosity, removed samples with excess missingness and outliers for heterozygosity, and changed the random seed. phase_common_static runs, but I get the same error each time for phase_rare_static at the HMM computations step:

phase_rare_static: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32<float>&, aligned_vector32<float>&, std::vector<unsigned int>&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed.

Commands used:

phase_common_static \
  --input chrom7_array_exome_combined_qcFiltered.vcf.gz \
  --filter-maf 0.001 \
  --region 7 \
  --map chr7.b38.gmap.gz \
  --output chrom7_array_exome_combined_qcFiltered_phased.bcf \
  --thread 64 \
  --log chrom7_array_exome_combined_qcFiltered_phased.log  

phase_rare_static \
  --input chrom7_array_exome_combined_qcFiltered.vcf.gz \
  --scaffold chrom7_array_exome_combined_qcFiltered_phased.bcf \
  --map chr7.b38.gmap.gz \
  --output chrom7_array_exome_combined_qcFiltered_phased_rare.bcf \
  --log chrom7_array_exome_combined_qcFiltered_phased_rare.log \
  --scaffold-region "7:100096734-131496019" \
  --input-region "7:101096734-130496019" \
  --thread 64 \
  --seed 47 
MoiColl commented 11 months ago

Any news on this error? I'm also getting this error, but for some chunks only. What do you think might be the source of the error? Could it be some weird variants/format in the chunk that is being processed?

vasilipankratov commented 8 months ago

Hi all!

I was getting the same error when trying to phase rare variants. Actually, I was getting two flavors of the same error:

Assertion GRvar_genotypes[vr][tidx].prob >= 0.0f failed. or Assertion !isnan(GRvar_genotypes[vr][tidx].prob) failed.

I managed to fix it by polishing the vcf. In particular, I updated the INFO tags (with bcftools +fill-tags). Perhaps, shapeit requires proper values in the AF field or something else.

It would be nice if the upcoming versions of shapeit5 were a bit more explicit about this kind of fails or at least the requirements for the vcf were outlined in the documentation :).

jalghor commented 7 months ago

I was having the same issue for some of the chunks when using phase rare. I fixed it by increasing the chunk size.

Best

justinjj24 commented 6 months ago

I was having the same issue for some of the chunks when using phase rare. I fixed it by increasing the chunk size.

Best

@jalghor mean you generated the chunks for rare variants with more than --window-cm arg (=4) by GLIMPSE2_chunk or referring the seed size --seed arg (=15052011) Seed of the random number generator in phase_rare. Can you provide the value you used to overcome the failure? In my phase_common output AF is missing in the INFO field, same as @vasilipankratov mentioned. Wonder which one would fix the error or which one to consider if the results vary! Thanks