immunogenomics / HLA-TAPAS

HLA-TAPAS pipeline for HLA association and fine-mapping studies
47 stars 20 forks source link

Error: No variants remaining after --exclude. #9

Closed masen1991 closed 6 months ago

masen1991 commented 2 years ago

Hi @WansonChoi ,

i try to use hg38 version KGP samples to build a reference panel,but i get like this : [5] Encoding SNP positions. Error: No variants remaining after --exclude. and then it all go wrong after this step.

Do u have any idea? i try to use SNP2HLA-MakeReference,the old version ,but snps in hg18 ,it go well.

WansonChoi commented 2 years ago

@masen0407

Hi, Thank you for your report.

I'm sorry, but it's hard for me to figure out the exact cause only with this information.

The error message implicates it failed around step 5 of MakeReference_v2, but previous steps could have failed, too. (and this caused the failure of PLINK implementation).

Can you share the PLINK log file that failed last and the HLA-TAPAS main log file?

masen1991 commented 2 years ago

@WansonChoi Sorry for the missing info. i use --save-intermediates and upload all log file and the main log the command i use like this : python -m MakeReference \ --variants DIR/KGP_EUR/hg38 \ --chped DIR/KGP_EUR.chped \ --hg 38 \ --out DIR/test_hla_tapas/KGP_EUR/hg38 \ --dict-AA MakeReference/data/hg38/HLA_DICTIONARY_AA.hg38.imgt3320 \ --dict-SNPS MakeReference/data/hg38/HLA_DICTIONARY_SNPS.hg38.imgt3320 \ --phasing \ --mem 100g \ --nthreads 72 \ --save-intermediates main_log.txt hg38.SNPS.TMP.log hg38.SNPS.FOUNDERS.log hg38.SNPS.CODED.log hg38.MERGED.FOUNDERS.log hg38.MERGED.FOUNDERS.FRQ.log hg38.log hg38.HLA.log hg38.HLA.FOUNDERS.log hg38.FRQ.log hg38.FOUNDERS.QC.log hg38.FOUNDERS.log hg38.FOUNDERS.hardy.log hg38.FOUNDERS.freq.log hg38.AA.TMP.log hg38.AA.FOUNDERS.log hg38.AA.CODED.log

IF u need i can share you the *bed file to make reference and the cped file.

WansonChoi commented 2 years ago

@masen0407

If you can, please share the genotype data(*.{bed,bim,fam}) and chped file, too. Then, I can investigate the cause more closely.

masen1991 commented 2 years ago

@masen0407

If you can, please share the genotype data(*.{bed,bim,fam}) and chped file, too. Then, I can investigate the cause more closely. i put all files in a zip. KGP_EUR.zip

masen1991 commented 2 years ago

@WansonChoi Do u have any results?

WansonChoi commented 2 years ago

@masen0407

Hi, thank you for waiting the response.

The cause was that your chped file consists of 2-field HLA types. The dictionaries that you used are for 4-field HLA types, which can't be used for your case.

I guess you want to generate a reference panel with (only) 2-field HLA types. For this, I attach alternative dictionaries for 2-field HLA types. Pass the prefix of these dictionaries to the '--dict-AA' and' --dict-SNPS' arguments and try it again, please.

For example,

""" $ python -m MakeReference \ --variants KGP_EUR/KGP_EURhg38 \ --chped KGP_EUR/KGP_EUR.chped \ --hg 38 \ --out test/test \ --dict-AA _HLA_DICTIONARYAA.hg38.imgt3320.2field \ --dict-SNPS _HLA_DICTIONARYSNPS.hg38.imgt3320.2field \ --phasing \ --mem 8g \ --nthreads 4 \ --save-intermediates

"""

HLA_DICTIONARY_AAorSNPS.hg38.imgt3320.2field.zip

masen1991 commented 2 years ago

@WansonChoi Thank you very much Can u also attach hg18/hg19 version alternative dictionaries for 2-field HLA types? And i see this on CookHLA "Also, Since most SNP2HLA-formatted reference panels are distributed in hg18, the Human Genome version of the copied target data will be lifted-down to hg18 automatically if its version is not hg18." So can HLA-TAPAS MakeReference panel use on CookHLA?or maybe Do i need to liftover SNPs to hg18 and then use HLA-TAPAS to make a panel? And how to change BEAGLE v4 .bgl.phased.vcf.gz to BEAGLE v3 .bgl.phased file type so it can be used by CookHLA? Although it is an updated version of the SNP2HLA,but many HLA reference data need to be .bgl.phased, and .bim format. such as CookHLA,do you have any suggestion on how to make the file type change smoothly and conveniently?