odelaneau / GLIMPSE

Low Coverage Calling of Genotypes
MIT License
136 stars 26 forks source link

Error message "States for individual 0 are zero. Error during selection" but imputing a whole chromosome for 300 samples? #121

Closed robertwhbaldwin closed 1 year ago

robertwhbaldwin commented 1 year ago

I included my log file below. I have 300 samples at around 1x coverage and am trying to impute chromsome1. Based on what I've read the error message I'm receiving should not occur as there will be plenty of variants. Any idea why I'm seeing this error message? Thank You - Robert

./glimpse.sh

[GLIMPSE2] Phase and impute low coverage sequencing data

Files:

GLIMPSE_phase parameters:

Model parameters:

Selection parameters:

Genotype calling:

BAM/CRAM filters and options:

Other parameters

Initialisation:

Initializing iteration

Burn-in iteration [1/5]

ERROR: States for individual 0 are zero. Error during selection

ERROR: States for individual 1 are zero. Error during selection.

ERROR: States for individual 2 are zero. Error during selection.

E(base

srubinacci commented 1 year ago

Hi,

Indeed, something is off here. My guess is that you have a small reference panel and the default threshold does not work for you. Would it be possible for you to share the data of this chunk (by email)? I can look at it.

srubinacci commented 1 year ago

I should probably add that GLIMPSE2 is designed for VERY large reference panels, and maybe for your use case GLIMPSE1 is more appropriate.

robertwhbaldwin commented 1 year ago

thanks for the response! I'll have to get permission to share the data. But the reference panel was for rainbow trout Oncorhynchus mykiss and was downloaded from here http://mgb.qnlm.ac/ under downloads. It contained 179 samples, which is small, yes.

On Sun, Jan 29, 2023 at 5:29 PM Simone Rubinacci @.***> wrote:

I should probably add that GLIMPSE2 is designed for VERY large reference panels, and maybe for your use case GLIMPSE1 is more appropriate.

— Reply to this email directly, view it on GitHub https://github.com/odelaneau/GLIMPSE/issues/121#issuecomment-1407774666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AITBRRSDCOV26DKIB3N5ZMDWU3ONHANCNFSM6AAAAAAUDAOUHU . You are receiving this because you authored the thread.Message ID: @.***>

srubinacci commented 1 year ago

I see - thank you for your response. I recommend to use GLIMPSE1 (release: https://github.com/odelaneau/GLIMPSE/releases/tag/v1.1.1, documentation: https://odelaneau.github.io/GLIMPSE/glimpse1/) then.

Thank you for getting in touch about this, will add flags/warnings in G2. Indeed I should make more clear when to use one or the other software. I apologise for the inconvenience.

Let me know if you get any issues with GLIMPSE1 (maybe in a new issue).

robertwhbaldwin commented 1 year ago

thanks for the advice I'll try G1. Does G1 take bam files as input? I'm confused about that part.

srubinacci commented 1 year ago

You're welcome. I should actually check your case more in details. It's likely that you trigger a silly G2 bug. The thing is that you only have 3I58 haplotypes and Kpbwt is set to 2000. So no selection is needed and the software should work on the 358 haps directly. I will likely look at this in details on Wednesday and will update you here.

Regarding reading directly BAM files, unfortunately no, G1 uses only genotype likelihoods file in VCF/BCF for now (it'll be part of future updates). For low-coverage data, the easiest and fast method to get genotype likelihoods is bcftools.

Just provide the bams to bcftools following these instructions: https://odelaneau.github.io/GLIMPSE/glimpse1/tutorial_b38.html#run_likelihoods

It's relatively simple: bcftools requires a BAM file and does calling at the positions you want: in your case reference panel positions. Only annoying thing is that bcftools requires these positions in two formats: TSV and VCF.gz (this explains 3.1). It also requires the reference genome (fasta) in the right build. It's a bit tedious, but now you should be able to understand step3.2 as well. Then you will need to run 3.3. as well. it's indeed a bit annoying compared to G2, but it should be pretty quick as bcftools is an amazing tool.

I can help if needed. Best,

Simone

srubinacci commented 1 year ago

It's likely a small bug, thanks for reporting it. I think GLIMPSE2 should work fine here. I will look at it very soon and report here. Sorry for the delays.

srubinacci commented 1 year ago

Hi @robertwhbaldwin @karinkumar Thank you for reporting this issue. Indeed, there was a bug in GLIMPSE when reference panels are small. I fixed and pushed the code (it's already online - feel free to pull the code). I will prepare a new release in the next few days containing few bugfixes if you can wait.

srubinacci commented 1 year ago

Prerelease v2.0.1 has been made available, solving this issue. A full release will be coming soon.