zhengxwen / HIBAG

R package – HLA Genotype Imputation with Attribute Bagging (development version only)
https://hibag.s3.amazonaws.com/index.html
29 stars 7 forks source link

"there are 0 individuals in common" and "IDs in PLINK bed are not unique!" #10

Open sidtjn opened 4 years ago

sidtjn commented 4 years ago

Hi, I have been trying to predict HLA allele type using HIBAG on two different datasets, one with all SNPs and the other with WGS data.

With the SNP dataset, I could not get the function hlaCompareAllele to work. The following is how I used the function;

> rv_ct0_sea730k <- hlaCompareAllele(true_b, hla_b_sea730k, call.threshold = 0)
Calling 'hlaCompareAllele': there are 0 individuals in common.

> rv_ct5_sea730k <- hlaCompareAllele(true_b, hla_b_sea730k, call.threshold = 0.5)
Calling 'hlaCompareAllele': there are 0 individuals in common.

I also tried training the data;

> sea730k_model <- hlaParallelAttrBagging(10, true_b, train.geno_sea730k, nclassifier = 100)
Error in .DynamicClusterCall(cl, fun = function(job, hla, snp, mtry, prune,  : 
  One node produced an error: There is no common sample between 'hla' and 'snp'.

With the WGS dataset, I also could not get hlaBED2Geno to work.

> geno_dusun <- hlaBED2Geno("BNF_HLA.bed","BNF_HLA.bim","BNF_HLA.fam", assembly = "hg38")
Open "BNF_HLA.bed" in the SNP-major mode.
Error in hlaBED2Geno("BNF_HLA.bed", "BNF_HLA.bim", "BNF_HLA.fam", assembly = "hg38") : 
  IDs in PLINK bed are not unique!

The WGS dataset was converted from vcf to plink format using the plink tool.

For both the WGS and SNP dataset, can I resolve this by adjusting the data to a certain format?

zhengxwen commented 4 years ago

You should check the sample IDs before you run any HIBAG function. See train.geno_sea730k$sample.id and true_b$value$sample.id.