Closed bschilder closed 2 months ago
So after rereading Lazarin 2014, my understanding of Tiers is a bit different. Basically clinical characteristic can be assigned tiers (1-4). The tiers are then mapped onto severity categories (Mild, Moderate, Severe, Profound) like so:
So perhaps it would make more sense to map our phenotypes onto these severity categories instead of the tiers
Lazarin 2014 also struggled with the same ambiguity we're facing regarding the role of available treatments:
Availability of treatment is not a measure of the severity of an untreated disease. However, it was rated as highly important (more so than any sensory deficit); thus, while it is not sensible to include it in an assessment of untreated severity, it is reasonable to consider it in conjunction with severity when considering disease inclusion criteria. Unfortunately, the survey's design makes it difficult to interpret responses to this characteristic: it is not clear whether respondents believed that the presence or absence of treatment was of importance.
One thing we do improve upon over Lazarin is the issue of "expressivity". We basically capture a rough approximation of this with the never/rarely/often/always classifications.
Mapping our metrics onto Tiers is a bit challenging since they're quite different: (from Table 1 on Lazarin 2014)
Here's my closest approximation. Notable issues:
tiers_dict <- list(
## Tier 1
death=1,
intellectual_disability=1,
# congenital_onset=1,
## Tier 2
impaired_mobility=2,
physical_malformations=2,
## Tier 3
blindness=3,
sensory_impairments=3,
immunodeficiency=3,
cancer=3,
## Tier 4
reduced_fertility=4
)
Good spot, hadn’t noted that flow chart before.
Makes sense to use their system (maybe with ranking within e.g. profound, based on how many other tier 1 or 2 there are,
I agree with this:
tiers_dict <- list(
death=1,
intellectual_disability=1,
impaired_mobility=2,
physical_malformations=2,
blindness=3,
sensory_impairments=3,
immunodeficiency=3,
cancer=3,
reduced_fertility=4
)
From: Brian M. Schilder @.> Date: Tuesday, 14 May 2024 at 11:29 To: neurogenomics/gpt_hpo_annotations @.> Cc: Skene, Nathan G @.>, Mention @.> Subject: Re: [neurogenomics/gpt_hpo_annotations] Assign categorical Lazarin 2014 Tiers (Issue #4) This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Mapping our metrics onto Tiers is a bit challenging since they're quite different: (from Table 1 on Lazarin 2014) image.png (view on web)https://github.com/neurogenomics/gpt_hpo_annotations/assets/34280215/e93dafb7-7e9d-4f8b-9aae-9011d6711312
Here's my closest approximation. Notable issues:
tiers_dict <- list(
death=1,
intellectual_disability=1,
impaired_mobility=2,
physical_malformations=2,
blindness=3,
sensory_impairments=3,
immunodeficiency=3,
cancer=3,
reduced_fertility=4
)
— Reply to this email directly, view it on GitHubhttps://github.com/neurogenomics/gpt_hpo_annotations/issues/4#issuecomment-2109844457, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE2CQLVLMRSSWSRCQEDZCHRSBAVCNFSM6AAAAABGJPYC62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBZHA2DINBVG4. You are receiving this because you were mentioned.Message ID: @.***>
Severity class can be Mild, Moderate, Severe, or Profound. I've also generated a severity class score, which is just the proportion of metrics that meet our threshold of often/always. This provides a way to rank phenotypes within each severity class as well.
res_coded <- HPOExplorer::gpt_annot_codify()
map_severity_class <- function(r,
tiers_dict = list(
## Tier 1
death=1,
intellectual_disability=1,
# congenital_onset=1,
## Tier 2
impaired_mobility=2,
physical_malformations=2,
## Tier 3
blindness=3,
sensory_impairments=3,
immunodeficiency=3,
cancer=3,
## Tier 4
reduced_fertility=4
),
inclusion_values=c(2,3), # i.e. often, always
return_score=FALSE){
tiers <- unique(unlist(tiers_dict))
tier_scores <- lapply(stats::setNames(tiers,paste0("tier",tiers)),
function(x){
tx <- tiers_dict[unname(unlist(tiers_dict)==x)]
counts <- r[,sapply(.SD, function(v){v %in% inclusion_values}),
.SDcols = names(tx)]
list(
counts=counts,
proportion=sum(counts)/length(tx)
)
})
mean_proportion <- sapply(tier_scores, function(x)x$proportion)|>mean()
assigned_class <- if(sum(tier_scores$tier1$counts)>1){
c("profound"=mean_proportion)
} else if (sum(tier_scores$tier1$counts)>0 ||
sum(c(tier_scores$tier2$counts,tier_scores$tier3$counts))>3){
c("severe"=mean_proportion)
} else if(sum(tier_scores$tier3$counts)>0){
c("moderate"=mean_proportion)
} else{
c("mild"=mean_proportion)
}
if(return_score){
return(assigned_class)
} else{
return(names(assigned_class))
}
}
res_coded$annot_coded[,severity_class:=map_severity_class(.SD), by=.I]
res_coded$annot_coded[,severity_class_score:=map_severity_class(.SD, return_score = TRUE), by=.I]
I checked that there's a correspondence between our severity scores and the severity classes assigned in this way, and indeed there is:
Now described in Results and Methods under new section " Severity classes".
Added the violin plot to the supp as well.
@NathanSkene suggested we should use the Lazarin 2014 Tier system. But I pointed out that the reason we switched to a continuous severity score is because it provides a quantitative way of sorting the phenotypes. Also, we don't exactly recapitulate the Lazarin criteria with the GPT annotations, it's more like we were inspired by Lazarin 2014 to generate some our own somewhat similar criteria.
One mid-ground might be to create a rule-based function that attempts to approximate the Lazarin 2014 Tiers. It won't be exactly the same, but it might be useful for grouping our phenotypes into discrete severity categories.