rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
187 stars 55 forks source link

ERROR: annotation information could not be read. Perhaps check variant IDs matches those in the genotype file? #346

Closed dvh13 closed 2 years ago

dvh13 commented 2 years ago

Getting this error in step 2, trying to run burden tests. REGENIE v3.2.1.gz

ERROR: annotation information could not be read. Perhaps check variant IDs matches those in the genotype file?

The variants in the --anno-file are definitely in the --pgen pvar file. Even though b38 they have 1 69516 type chromosome position (not chr1 69516). The code is not looking for a chr1 (which would be in the vcf) is it? (remembering this differs b37 to b38)

The pgen runs fine for GWAS in regenie --step 2

Commands and head/parts of the key files below

thanks for any help, no doubt something stupid! david

         |=============================|
          |      REGENIE v3.2.1.gz      |
          |=============================|

Copyright (c) 2020-2022 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini. Distributed under the MIT License. Compiled with Boost Iostream library. Using Intel MKL with Eigen.

Log of output saved in file : 2022_10_12__27498exomes_GNH_binarytrait_burden.log

Options in effect: --step 2 \ --pgen chr1..22XY_hard_filters_CR.70_27498finalsampleQC_VQSR.FILTERPASSonly.withVEP.ID \ --covarFile ../GNH.44190.noEthnicOutliers.covariates.20PCs.PseudoNHS.BroadIDs.tab \ --covarExcludeList PseudoNHS_2022_07_20,GSA_OrageneID,GSA_OrageneIDChipID \ --phenoFile ../2022-06-15_big_regenie_phenoFile.BroadID.tab \ --bsize 1000 \ --bt \ --vc-tests skato \ --firth \ --approx \ --pred 2022_10_1227498exomes_fit_binary_out_pred.list \ --aaf-bins 0.01,0.001 \ --out 2022_10_1227498exomes_GNH_binarytrait_burden \ --set-list ../LoF_regenie_setlist_27498-samples_VQSR_PASS.txt \ --anno-file ../LoF_regenie_annotation_27498-samples_VQSR_PASS.txt \ --mask-def ../LoF_regenie_annotation_27498-samples_VQSR_PASS-mask-def.txt \ --extract-setlist LDLR(ENSG00000130164)

head ../LoF_regenie_annotation_27498-samples_VQSR_PASS.txt 1:69516:G:A OR4F5(ENSG00000186092) stop 1:930158:CG:C SAMD11(ENSG00000187634) frameshift 1:930270:CCA:C SAMD11(ENSG00000187634) frameshift 1:931073:C:T SAMD11(ENSG00000187634) stop 1:935887:C:T SAMD11(ENSG00000187634) stop

(selected lines from the pvar chr1..22XY_hard_filters_CR.70_27498finalsampleQC_VQSR.FILTERPASSonly.withVEP.ID.pvar)

CHROM POS ID REF ALT QUAL FILTER INFO

1 16619 chr1_16619_C_CGCT C CGCT 11754 PASS AC=1;etc.... 1 69516 chr1_69516_G_A G A 2789 PASS AC=2;etc...

dvh13 commented 2 years ago

Is this "1:69516:G:A" in the --anno-file matched to CHROM:POS:REF:ALT from the pvar? Or matched to ID in the pvar?

dvh13 commented 2 years ago

OK fixed it by trial and error

The variants as listed in the --anno-file and --set-file have to be in same format as ID field in the vcf/pvar.

presumably UKBB has ID with 1:1234567:A:T type format, whereas we went for gnomAD type format

it was not clear in the documentation that needed to match ID field.

regards, david

Antonio-Nappi commented 1 year ago

I have the same issue as you but I didn't understand how you fixed it. My pann file is like this one

22:15528182:C:T OR11H1  synonymous
22:15528188:C:T OR11H1  synonymous
22:15528194:G:A OR11H1  missense

while my pvar file is like this one:

OR11H1  22 15528182 22:15528182:C:T,22:15528188:C:T,22:15528194:G:A,22:15528206:G:A,22:15528267:A:G,22:15528300:A:G,22:15528306:T:A,22:15528309:G:A,22:15528316:C:G,22:15528318:A:G,22:15528326:G:A,22:15528340:C:T,22:15528345:G:T,22:15528351:T:C,22:15528363:C:T,22:15528393:G:A,22:15528413:G:A,22:15528417:T:C,22:15528583:G:A,22:15528584:C:G,22:15528584:C:T,22:15528585:C:T,22:15528586:G:A,22:15528586:G:C,22:15528593:G:T,22:15528594:C:T,22:15528603:A:T,22:15528613:C:T,22:15528617:G:A,22:15528617:G:C,22:15528619:A:G,22:15528624:T:C,22:15528628:C:A,22:15528628:C:T,22:15528633:C:G,22:15528634:T:C,22:15528640:T:A,22:15528640:T:C,22:15528643:T:C,22:15528644:G:A,22:15528650:G:A,22:15528650:G:T,22:15528652:T:G,22:15528656:T:C,22:15528666:T:C,22:15528666:T:G,22:15528668:G:A,22:15528671:C:T,22:15528687:C:T,22:15528689:C:T,22:15528692:C:G,22:15528698:G:A,22:15528702:C:T,22:15528703:C:T,22:15528704:C:T,22:15528708:T:C,22:15528712:G:A,22:15528712:G:T,22:15528713:C:T,22:15528714:C:T,22:15528720:A:G,22:15528723:A:G,22:15528723:A:T,22:15528731:T:A,22:15528732:G:T,22:15528739:G:A,22:15528740:T:G,22:15528741:G:A,22:15528743:C:A,22:15528744:C:G,22:15528747:G:A,22:15528748:G:T,22:15528749:G:A,22:15528753:C:T,22:15528754:G:A,22:15528756:T:C,22:15528760:C:T,22:15528761:A:G,22:15528761:A:T,22:15528762:T:C,22:15528767:T:C,22:15528768:T:G,22:15528769:G:T,22:15528777:G:GC,22:15528778:C:T,22:15528779:C:T,22:15528784:G:T,22:15528787:T:A,22:15528788:C:T,22:15528789:C:G,22:15528799:G:T,22:15528800:C:T,22:15528807:C:T,22:15528822:A:T,22:15528836:C:T,22:15528839:C:G,22:15528845:T:C,22:15528846:A:G,22:15528859:C:A,22:15528859:C:T,22:15528860:T:C,22:15528866:C:T,22:15528867:C:T,22:15528873:G:A,22:15528877:T:C,22:15528882:G:T,22:15528886:T:C,22:15528887:G:A,22:15528888:C:A,22:15528888:C:T,22:15528892:C:T,22:15528893:A:C,22:15528895:G:A,22:15528907:A:G,22:15528911:G:A,22:15528912:G:A,22:15528919:C:T,22:15528921:A:T,22:15528922:C:A,22:15528922:C:T,22:15528923:C:A,22:15528925:G:C,22:15528925:G:T,22:15528926:T:A,22:15528929:G:A,22:15528929:G:T,22:15528939:G:A,22:15528942:G:A,22:15528944:G:T,22:15528946:T:A,22:15528950:A:T,22:15528951:C:G,22:15528956:C:T,22:15528958:A:G,22:15528959:T:C,22:15528960:A:G,22:15528962:C:T,22:15528964:C:T,22:15528966:C:T,22:15528970:T:C,22:15528972:G:A,22:15528977:G:A,22:15528985:G:A,22:15528991:G:A,22:15528995:C:T,22:15528996:G:A,22:15529009:G:A,22:15529011:A:T,22:15529013:G:A,22:15529063:C:T,22:15529065:C:T,22:15529068:A:G,22:15529069:T:C,22:15529071:T:G,22:15529072:A:G,22:15529074:A:C,22:15529076:C:T,22:15529082:G:A,22:15529093:T:A,22:15529097:G:A,22:15529105:T:G,22:15529109:G:A,22:15529116:C:T,22:15529117:T:C,22:15529119:G:A,22:15529127:C:G
POTEH   22 15690081 22:15690081:G:A,22:15690091:C:T,22:15690102:C:T,22:15690108:G:A,22:15690110:C:T,22:15690111:T:C,22:15690113:C:T,22:15690117:G:A,22:15690117:G:C,22:15690122:G:A,22:15690130:T:G,22:15690142:G:A,22:15690143:C:A,22:15690149:G:T,22:15690150:G:A,22:15690151:G:A,22:15690154:A:G,22:15690162:C:A,22:15690173:C:T,22:15690174:G:A,22:15690175:C:T,22:15690182:CA:C,22:15690183:A:AG,22:15690187:G:A,22:15690188:G:A,22:15690194:C:A,22:15690203:C:T,22:15690206:G:T,22:15690369:A:G,22:15690377:G:A,22:15690498:C:A,22:15690499:A:G,22:15690509:C:A,22:15690509:C:T,22:15690512:C:A,22:15690512:C:G,22:15690513:T:A,22:15690521:G:T,22:15690523:G:C,22:15690525:G:A,22:15690526:GCA:G,22:15690540:G:A,22:15690541:G:T,22:15690542:C:T,22:15690543:C:A,22:15690546:T:G,22:15690547:G:C,22:15690549:G:A,22:15690557:C:G,22:15690557:C:T,22:15690558:G:A,22:15690560:C:T,22:15690561:G:A,22:15690567:G:A,22:15690568:C:A,22:15690572:C:G,22:15690581:G:T,22:15690583:G:A,22:15690590:C:T,22:15690595:G:A,22:15690597:C:G,22:15690600:G:T,22:15690604:A:T,22:15690609:G:T,22:15690623:A:G,22:15690630:T:TG,22:15690632:G:A,22:15690634:G:C,22:15690637:G:A,22:15690642:G:A,22:15690645:C:T,22:15690662:C:T,22:15690663:G:T,22:15690669:C:T,22:15690671:C:A,22:15690675:G:A,22:15690679:C:T,22:15690690:A:G,22:15690708:A:G,22:15690711:T:A,22:15700076:A:T,22:15708035:T:C,22:15708043:G:A,22:15708049:G:T,22:15708057:A:G,22:15708066:A:T,22:15708080:G:A,22:15708084:G:C,22:15708088:G:A,22:15710930:C:T,22:15710931:G:A,22:15710950:G:A
CCT8L2  22 16590877 22:16590877:C:CTAGT,22:16590896:A:AT,22:16590897:T:A,22:16590898:T:C,22:16590900:T:G,22:16590901:T:G,22:16590911:A:T,22:16590914:G:A,22:16590915:G:A,22:16590915:G:C,22:16590916:A:T,22:16590919:T:G,22:16590921:G:A,22:16590924:GTTTC:G,22:16590932:G:C,22:16590940:A:G,22:16590941:G:A,22:16590942:A:C,22:16590943:G:A,22:16590952:C:G,22:16590958:C:G,22:16590969:T:A,22:16590971:G:A,22:16590977:T:C,22:16590981:T:C,22:16590986:A:G,22:16590990:C:T,22:16590991:G:A,22:16590991:G:T,22:16591005:TCA:T,22:16591008:C:T,22:16591020:C:G,22:16591023:C:T,22:16591025:T:A,22:16591026:C:A,22:16591029:C:G,22:16591037:C:A,22:16591037:C:T,22:16591038:G:A,22:16591053:T:A,22:16591059:T:C,22:16591063:G:T,22:16591067:T:C,22:16591068:C:T,22:16591073:ACC:A,22:16591079:T:C,22:16591083:G:A,22:16591093:T:C,22:16591102:T:A,22:16591104:C:A,22:16591111:C:G,22:16591122:G:A,22:16591123:G:A,22:16591126:G:T,22:16591129:C:T,22:16591132:A:T,22:16591133:C:G,22:16591138:G:T,22:16591144:T:C,22:16591145:C:T,22:16591150:C:A,22:16591151:A:G,22:16591159:C:T,22:16591161:TCAC:T,22:16591164:C:T,22:16591165:G:A,22:16591186:A:G,22:16591191:C:T,22:16591196:A:C,22:16591213:C:T,22:16591217:A:G,22:16591219:G:A,22:16591231:T:C,22:16591249:A:C,22:16591255:C:T,22:16591271:C:T,22:16591275:T:C,22:16591277:T:C,22:16591283:A:G,22:16591284:G:C,22:16591302:T:C,22:16591303:T:C,22:16591309:G:A,22:16591312:C:T,22:16591331:C:T,22:16591343:C:T,22:16591349:T:C,22:16591357:G:A,22:16591362:C:T,22:16591364:A:C,22:16591367:C:T,22:16591368:C:T,22:16591369:G:A,22:16591375:G:C,22:16591376:A:G,22:16591377:C:T,22:16591378:G:A,22:16591383:G:A,22:16591385:T:A,22:16591394:C:T,22:16591395:G:A,22:16591395:G:T,22:16591402:C:G,22:16591403:T:C,22:16591403:T:G,22:16591409:G:A,22:16591412:G:A,22:16591415:C:G,22:16591420:G:A,22:16591422:G:C,22:16591423:A:G,22:16591430:G:A,22:16591442:G:A,22:16591457:C:T,22:16591461:C:A,22:16591465:T:C,22:16591476:A:AT,22:16591478:C:T,22:16591481:T:C,22:16591481:T:G,22:16591483:T:C,22:16591485:C:T,22:16591491:C:T,22:16591494:G:T,22:16591496:C:G,22:16591497:T:G,22:16591498:G:A,22:16591503:C:T,22:16591510:G:A,22:16591516:G:T,22:16591520:G:C,22:16591521:G:C,22:16591521:G:T,22:16591526:T:C,22:16591539:G:C,22:16591541:C:T,22:16591542:G:A,22:16591551:G:A,22:16591553:G:A,22:16591554:G:A,22:16591554:G:C,22:16591554:G:T,22:16591556:G:A,22:16591562:A:G,22:16591563:A:G,22:16591583:A:T,22:16591585:C:T,22:16591591:C:T,22:16591592:C:T,22:16591595:G:T,22:16591597:C:T,22:16591600:A:G,22:16591602:C:G,22:16591602:C:T,22:16591603:T:G,22:16591611:C:T,22:16591613:A:G,22:16591615:G:A,22:16591622:T:C,22:16591625:T:G,22:16591626:T:G,22:16591630:C:T,22:16591631:G:A,22:16591636:T:C,22:16591637:G:A,22:16591638:T:C,22:16591639:G:A,22:16591643:G:A,22:16591647:C:T,22:16591648:C:G,22:16591648:C:T,22:16591650:C:T,22:16591651:G:A,22:16591653:C:T,22:16591654:G:A,22:16591654:G:C,22:16591659:C:A,22:16591661:C:T,22:16591662:C:G,22:16591665:A:G,22:16591674:C:T,22:16591688:G:A,22:16591708:T:G,22:16591710:G:A,22:16591711:C:T,22:16591714:T:C,22:16591728:C:T,22:16591729:G:A,22:16591731:T:G,22:16591734:C:T,22:16591739:C:A,22:16591743:A:G,22:16591749:C:T,22:16591750:T:C,22:16591761:G:T,22:16591766:G:A,22:16591768:A:G,22:16591772:C:T,22:16591773:G:A,22:16591777:C:T,22:16591778:G:A,22:16591788:C:T,22:16591793:G:A,22:16591794:G:A,22:16591797:G:A,22:16591797:G:T,22:16591799:G:A,22:16591803:G:A,22:16591803:G:C,22:16591811:G:A,22:16591816:A:G,22:16591818:C:A,22:16591824:G:A,22:16591831:C:T,22:16591842:T:C,22:16591852:G:A,22:16591857:T:C,22:16591861:C:A,22:16591888:C:A,22:16591891:C:T,22:16591892:G:A,22:16591899:G:A,22:16591901:C:T,22:16591915:T:G,22:16591918:C:T,22:16591921:C:G,22:16591922:C:T,22:16591923:C:G,22:16591923:C:T,22:16591924:G:A,22:16591929:G:A,22:16591930:C:A,22:16591930:C:T,22:16591931:G:A,22:16591932:C:T,22:16591933:G:A,22:16591933:G:T,22:16591934:C:T,22:16591940:C:T,22:16591945:A:C,22:16591946:C:T,22:16591947:G:A,22:16591950:C:A,22:16591953:G:C,22:16591955:T:A,22:16591957:G:C,22:16591966:G:A,22:16591966:G:T,22:16591981:A:G,22:16591983:C:T,22:16591985:C:T,22:16591987:G:A,22:16591988:C:T,22:16591990:G:T,22:16591993:G:A,22:16591994:T:C,22:16591997:G:GCCA,22:16592004:G:A,22:16592009:G:T,22:16592016:G:A,22:16592016:G:T,22:16592021:A:G,22:16592032:G:A,22:16592035:G:A,22:16592040:T:A,22:16592042:T:C,22:16592049:C:G,22:16592049:C:T,22:16592054:T:A,22:16592060:G:A,22:16592067:A:G,22:16592084:C:T,22:16592090:G:T,22:16592097:T:A,22:16592097:T:C,22:16592101:C:G,22:16592105:G:A,22:16592105:G:T,22:16592108:G:A,22:16592112:G:A,22:16592120:A:C,22:16592124:C:A,22:16592125:C:A,22:16592131:A:G,22:16592137:C:T,22:16592138:G:A,22:16592138:G:C,22:16592140:G:T,22:16592142:C:T,22:16592143:G:A,22:16592146:G:A,22:16592153:C:A,22:16592153:C:T,22:16592154:G:A,22:16592161:C:A,22:16592161:C:T,22:16592162:G:A,22:16592162:G:C,22:16592162:G:T,22:16592165:C:A,22:16592165:C:T,22:16592166:G:A,22:16592168:G:A,22:16592172:G:A,22:16592174:C:A,22:16592175:C:T,22:16592187:G:A,22:16592187:G:T,22:16592198:T:C,22:16592202:C:T,22:16592207:A:G,22:16592215:C:G,22:16592216:G:A,22:16592220:G:A,22:16592220:G:C,22:16592225:A:G,22:16592229:C:T,22:16592230:G:A,22:16592241:C:T,22:16592242:G:A,22:16592243:T:C,22:16592256:C:T,22:16592260:C:T,22:16592262:G:A,22:16592264:G:A,22:16592264:G:T,22:16592266:T:C,22:16592268:G:A,22:16592276:TC:T,22:16592279:C:T,22:16592280:G:A,22:16592290:T:C,22:16592290:TG:T,22:16592298:G:T,22:16592306:A:G,22:16592308:C:T,22:16592322:G:C,22:16592331:T:C,22:16592338:C:T,22:16592341:C:A,22:16592341:C:G,22:16592341:C:T,22:16592342:G:A,22:16592344:GC:G,22:16592350:T:C,22:16592352:T:G,22:16592384:C:T,22:16592385:G:A,22:16592386:G:A,22:16592388:C:T,22:16592390:T:G,22:16592399:T:G,22:16592408:C:T,22:16592409:G:A,22:16592416:A:T,22:16592420:G:T,22:16592431:G:A,22:16592440:C:T,22:16592450:A:G,22:16592451:G:T,22:16592455:G:T,22:16592459:G:A,22:16592462:T:G,22:16592474:G:T,22:16592475:G:C,22:16592477:C:T,22:16592481:T:C,22:16592482:C:T,22:16592483:G:A,22:16592490:C:T,22:16592493:T:C,22:16592497:G:T,22:16592501:A:G,22:16592505:C:A,22:16592510:C:T,22:16592511:G:A,22:16592513:T:A,22:16592516:G:T,22:16592520:G:A,22:16592523:C:T,22:16592536:G:T,22:16592543:C:A,22:16592543:C:T,22:16592548:C:T,22:16592550:T:C

My geno file is that one provided by ukbb. How should I fix the issue?

joellembatchou commented 1 year ago

Can you check that the variant ID format in your input genotype file matches that in the annotation & setlist files?

Antonio-Nappi commented 1 year ago

@joellembatchou the pvar file that I use as input has the following structure:

#CHROM  POS ID  REF ALT
22  15479504    22:16498458_G_A G   A
22  15482129    22:16495833_C_A C   A
22  15489260    22:16488702_G_C G   C
22  15489327    22:16488635_C_A C   A
22  15513688    22:16464274_A_C A   C
22  15920590    22:16057417_T_C T   C
22  15923554    22:16054454_C_T C   T
22  15924146    22:16053862_C_T C   T
22  15926767    22:16051249_T_C T   C

while the set-list that I have built is present in the previous comment ( I have miscalled it as pvar) so the ID is the same (just the sep is different, maybe this is the error?)

dvh13 commented 1 year ago

that's the error cant have two different formats 22:15528182:C:T and 22:16498458_GA just replace all the : with in all the files and you'll be fine (ie gnomAD format) bw

Antonio-Nappi commented 1 year ago

so I should replace the last two : with _?

dvh13 commented 1 year ago

as long as they are all the exact same format in all files, you will be fine

recommendation to use 22_16498458_G_A which is 'standard' gnomAD format