Closed klzhang2022 closed 1 year ago
The running log is as follows:
[GLIMPSE2] Phase and impute low coverage sequencing data
Files:
GLIMPSE_phase parameters:
Model parameters:
Selection parameters:
Genotype calling:
BAM/CRAM filters and options:
Other parameters
Initialisation:
Initializing iteration
Burn-in iteration [1/5]
[31mERROR: [0mStates for individual 0 are zero. Error during selection.
At first, we thought that the information contained in the bam file was too little, but looking at the number of SNPs called out by the bam file, 18 chromosomes contained a total of 1,161,192 SNPs, and chromosome 1 contained 97,582 SNPs, which we thought reached the minimum for imputation need.
Hi, I think this might the same bug reported here (#121) that is now fixed in the latest commit version. Can you please compile the latest master version and see if that solves the problem? Otherwise let me know if you need new static binaries.
Thanks for your reply, after using the new version, the imputation works successfully.
We have another question, how to run multi-sample imputation (multiple bam files) ? Set up a multi-sample for loop in the shell, and finally merge the imputed bcf file of each chromosome of each sample through "bcftools merge sample1.bcf sample2.bcf ... sampleN.bcf -o merged.bcf" ?
You can either have a multi-sample genotype likelihood file in a single VCF/BCF file or use the --bam-list option, where you specify a text file with the bam file locations (and optionally sample names as a second column)
Thank you for your reply, according to your recommendation, we have solved the problem.
Here, we have two points to confirm with you:
There are two options for imputation files: VCF/BCF file containing the genotype likelihoods; BAM/CRAM file; VCF/BCF and BAM/CRAM must have an index file. For BAM/CRAM with multiple samples, we can use --bam -list parameter.
Two kinds of reference files and two kinds of files to be imputed can be combined arbitrarily.
Hi, Sorry for coming back only now. Not sure I follow and I understood the problem, my apologies in case.
For 2. you will only need to generate the chunks once for a specific reference panel. Once that is done, these chunk can be used for every user that want to perform imputation on that panel. Except very specific cases you do not want to let the user to specify the imputation region, you can fully precompute that and have the reference panel stored as a set bin files.
If for some specific reason you need the user to specify the region (be careful, this does not seem a normal use-case), you can just use the vcf.gz/bcf file format as a reference panel. However, I feel like you might not need this latter case.
Best,
Simone
When we execute GLIMPSE2_phase, each chunk reports the same error: 0mStates for individual 0 are zero. Error during selection. The three steps of step2_script_reference_panel.sh, step3_script_chunk.sh, and step4_script_split_reference.sh before step5_script_impute.sh are normal, and we don’t know where the problem is.
When performing imputation, the reference file is a 28-sample bcf file, and the file to be imputed is a 20-sample 1× bam file obtained by downsampling, and the above error is displayed. We also tried the reference file with 988 samples, the error is the same.
Because we are a group of pigs, 18 chromosomes run in a loop, and we set ne to 100. The specific execution code is as follows:
!/bin/bash
mkdir -p GLIMPSE_impute
REF=reference_panel/split/sorted_1_downsampled_DY-1_20 BAM=1_DY-1_20/sorted_1_downsampled_DY-1_20.bam
for i in {1..18} do while IFS="" read -r LINE || [ -n "$LINE" ]; do printf -v ID "%02d" $(echo $LINE | cut -d" " -f1) IRG=$(echo $LINE | cut -d" " -f3) ORG=$(echo $LINE | cut -d" " -f4) CHR=$(echo ${LINE} | cut -d" " -f2) REGS=$(echo ${IRG} | cut -d":" -f 2 | cut -d"-" -f1) REGE=$(echo ${IRG} | cut -d":" -f 2 | cut -d"-" -f2) OUT=GLIMPSE_impute/1_downsampled_DY-1_20_imputed /public/zhangkl/software/glimpse/phase/bin/GLIMPSE2phase --bam-file ${BAM} --reference ${REF}${CHR}${REGS}${REGE}.bin --ne 100 --output ${OUT}${CHR}${REGS}_${REGE}.bcf done < chunks.chr${i}.txt done