odelaneau / GLIMPSE

Low Coverage Calling of Genotypes
MIT License
129 stars 26 forks source link

GLIMPSE2_split_reference ERROR: AC/AN INFO fields in VCF are inconsistent with GT field, update the values in the VCF #189

Open jpcartailler opened 8 months ago

jpcartailler commented 8 months ago

Greetings,

I'm running into an error when chunking the reference. Here is my command

singularity exec /data/containers/glimpse_v2.0.0-27-g0919952_20221207.sif GLIMPSE2_split_reference \
--reference reference_panel/1000GP.chrX.bcf \
--map maps/genetic_maps.b38/chrX.b38.gmap.gz \
--input-region chrX:1-5293874 \
--output-region chrX:1-4629309 \
--output reference_panel/split/1000GP.chrX

Version is GLIMPSE2_split_reference v2.0.0 / commit = 3bed6d9 / release = 2022-12-07

When I run the above, it works great until it hits one chunk, then errors out:

Parsing specified genomic regions
  * Input region [chrX:1-5293874]       Output region  [chrX:1-4629309]
  * Genetic map          : [n=89236] (0.41s)
  * VCF/BCF scanning [L=135787 (Lrare= 61865 (45.6%) - Lcommon= 73922 - hq= 73922 (54.4%))]
  * VCF/BCF scanning [Nr=3202 (Nrh=0 - Nrd=3202)]
  * VCF/BCF scanning [Reg=chrX:1-5293874 / L=135787] [9.98s]
  * Reference panel parsing  [59%]
ERROR: AC/AN INFO fields in VCF are inconsistent with GT field, update the values in the VCF

As per this discussion, I tried both bcftools view -c 0 1000GP.chrX.bcf ... and bcftools +fill-tags 1000GP.chrX.bcf ... (and re-indexed), but the error remains.

Would appreciate any leads or ideas, thanks!

JP

gilibean commented 8 months ago

I had the same issue, and I used vcffixup (from vcflib) to fix the AC/AN inconsistencies that I did actually have. However, I believe it is a sort of default error message because I was still getting this error message even after that. Eventually, I solved it by removing all loci with missing data from the reference panel. Hope that helps.

srubinacci commented 8 months ago

Hi, @c5creative The error can be triggered because might be mixing PAR vs nonPAR regions of the genome. Please refer to this for chrX imputation and splitting chromosomes into PAR and notPAR regions: https://odelaneau.github.io/GLIMPSE/glimpse1/tutorial_chrX.html

@gilibean You reference panel should be phased and all major phasing software impute sporadic missing data. GLIMPSE expects a phased, non missing reference panel. If you have missing data in your reference panel, it could be a sign that there's a problem somewhere.

Best,

Simone

jpcartailler commented 8 months ago

Thanks @srubinacci - Getting back to this and will post results, hopefully by early next week.

Thanks again @srubinacci for point me to those docs. We were able to split off the non-PAR region and generate what we needed for the reference panel, but we're having issues with the PAR1 and PAR2 regions at the GLIMPSE2_chunk stage.

When running it:

singularity exec glimpse_v2.0.0-27-g0919952_20221207.sif \
GLIMPSE2_chunk \
--input reference_panel_X_PAR1/1000GP.chrX.sites.vcf.gz \
--region chrX:10001-2781479 \
--output reference_panel_X_PAR1/chunks.chrX.txt \
--map maps/genetic_maps.b38/chrX.b38.gmap.gz \
--sequential

We get segmentation fault (no terribly useful, sorry):

/var/spool/slurmd.cn1602/job58270973/slurm_script: line 122: 64348 Segmentation fault  

We haven't had this issue before, and I suspect the (PAR1 or PAR2) region might be too small to chunk?

srubinacci commented 8 months ago

Well the segfault is not nice - sorry. Yeah, there's no reason to chunk this region as it seems to extend less the 3Mb. GLIMPSE2 should work fine there.

Simone

ChristinaKriaridou commented 6 months ago

Hello @srubinacci! I am having the same issue and when I am trying to run the script below I am getting this error for all my chunks:

Files:
  * Input region         : [LG17:1-8486843]
  * Output region        : [LG17:1-7486843]
  * Reference VCF        : [HD_Parents.bcf]
  * Output binary        : [reference_panel/split/Oni_chr17]
  * Genetic Map          : [chr17.gmap]

GLIMPSE_split_reference parameters:
  * Sparse MAF           : [0.1%]
  * Keep monom. ref sites: [NO]
  * Seed                 : [15052011]
  * #Threads             : [1]

Splitting reference panel:

Parsing specified genomic regions
  * Input region [LG17:1-8486843]   Output region  [LG17:1-7486843]
  * Genetic map          : [n=308116] (0.14s)
  * VCF/BCF scanning ...
  * VCF/BCF scanning [L=41739 (Lrare= 0 (0.0%) - Lcommon= 41739 - hq= 41739 (100.0%))]
  * VCF/BCF scanning [Nr=126 (Nrh=0 - Nrd=126)]
  * VCF/BCF scanning [Reg=LG17:1-8486843 / L=41739] [0.08s]
  * Reference panel parsing ...
ERROR: AC/AN INFO fields in VCF are inconsistent with GT field, update the values in the VCF

The code I am running:

REF=HD_Parents.bcf
MAP=chr17.gmap
while IFS="" read -r LINE || [ -n "$LINE" ];
do
  printf -v ID "%02d" $(echo $LINE | cut -d" " -f1)
  IRG=$(echo $LINE | cut -d" " -f3)
  ORG=$(echo $LINE | cut -d" " -f4)

  GLIMPSE2_split_reference_static --reference ${REF} --map ${MAP} --input-region ${IRG} --output-region ${ORG} --output reference_panel/split/Oni_chr17
done < chunks.chr17.txt

I have also tried "bcftools view -c 0" and "bcftools +fill-tags" as mentioned above but I am getting the same error again. I even tried removing the INFO column from the vcf file and running bcftools +fill-tags again to see if that fixes the issue, but I had no luck. I am using v2.0.0, release = 2022-12-07. What could be the issue here? Do you think it is worth trying version 2.0.1? I would really appreciate any help or advice.

Thank you very much!

Best, Christina

srubinacci commented 6 months ago

Hi,

have you tried running GLIMPSE2 directly on the HD_Parents.bcf file? There's likely no need to the binary ref. panel given that you only have 126 samples.

Best,

Simone

ChristinaKriaridou commented 6 months ago

I tried running directly on the bcf file but I am getting the same error. These are the commands I am running:

REF=BCFtools_Oni17_HD_Parents_pos_newinfo.bcf
BAM=offspring/5X/5X_s18_sorted_chr17.bam

while IFS="" read -r LINE || [ -n "$LINE" ]; 
do   
    printf -v ID "%02d" $(echo $LINE | cut -d" " -f1)
    IRG=$(echo $LINE | cut -d" " -f3)
    ORG=$(echo $LINE | cut -d" " -f4)
    CHR=$(echo ${LINE} | cut -d" " -f2)
    REGS=$(echo ${IRG} | cut -d":" -f 2 | cut -d"-" -f1)
    REGE=$(echo ${IRG} | cut -d":" -f 2 | cut -d"-" -f2)
    OUT=GLIMPSE_impute/s18_imputed
    GLIMPSE2_phase_static --bam-file ${BAM} --reference ${REF} --map chr17.gmap --input-region ${IRG} --output-region ${ORG} --output ${OUT}_${CHR}_${REGS}_${REGE}.bcf --impute-reference-only-variants
done < chunks.chr17.txt

Please find attached the file with the error I am getting. glimpse_phase.txt

Could it be an error related to something else and not the AC/AN INFO field?

Thank you very much for your help.