Closed Santy-8128 closed 5 years ago
In your example, I don't see any problem:
pred_val <- list(locus="A", value=data.frame(
sample.id=c("HG01890", "HG01894", "HG01896"),
allele1=c("01:01", "23:01", "23:01"),
allele2=c("30:01", "23:01", "23:01"),
stringsAsFactors=FALSE)
)
class(pred_val) <- "hlaAlleleClass"
true_val <- pred_val
true_val$value[3, 3] <- "23:17"
hlaCompareAllele(true_val, pred_val)
hlaCompareAllele()
outputs
$overall
total.num.ind crt.num.ind crt.num.haplo acc.ind acc.haplo call.threshold
1 3 2 5 0.6666667 0.8333333 0
n.call call.rate
1 3 1
$confusion
True
Predict 01:01 23:01 23:17 30:01
01:01 1 0 0 0
23:01 0 3 1 0
23:17 0 0 0 0
30:01 0 0 0 1
... 0 0 0 0
See that acc.haplo=0.8333333
correctly.
By the way, according to HLA p-code, A23:17 and A23:01 are in the same p-coded group. https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/wmda/hla_nom_p.txt Old HLA typing techniques might not be able to differentiate A23:17 from A23:01.
Sorry I haven't had time to get back to this. I will look at it soon and get back to you or close the issue. Apologize for the delay.
It seems that HiBAG does NOT include those samples for accuracy calculation that have alleles that are NOT found in the training model. As an example, I have a training dataset where there are NO copies of the A23:17 allele. However, in my test dataset there are many copies of that allele. I see that any samples that had at least one copy of A23:17 has been removed from the accuracy calculation. I am not sure if this is intended or if I am missing something ?