Assign categorical Lazarin 2014 Tiers

bschilder commented 3 months ago

@NathanSkene suggested we should use the Lazarin 2014 Tier system. But I pointed out that the reason we switched to a continuous severity score is because it provides a quantitative way of sorting the phenotypes. Also, we don't exactly recapitulate the Lazarin criteria with the GPT annotations, it's more like we were inspired by Lazarin 2014 to generate some our own somewhat similar criteria.

One mid-ground might be to create a rule-based function that attempts to approximate the Lazarin 2014 Tiers. It won't be exactly the same, but it might be useful for grouping our phenotypes into discrete severity categories.

bschilder commented 2 months ago

So after rereading Lazarin 2014, my understanding of Tiers is a bit different. Basically clinical characteristic can be assigned tiers (1-4). The tiers are then mapped onto severity categories (Mild, Moderate, Severe, Profound) like so:

So perhaps it would make more sense to map our phenotypes onto these severity categories instead of the tiers

bschilder commented 2 months ago

Lazarin 2014 also struggled with the same ambiguity we're facing regarding the role of available treatments:

Availability of treatment is not a measure of the severity of an untreated disease. However, it was rated as highly important (more so than any sensory deficit); thus, while it is not sensible to include it in an assessment of untreated severity, it is reasonable to consider it in conjunction with severity when considering disease inclusion criteria. Unfortunately, the survey's design makes it difficult to interpret responses to this characteristic: it is not clear whether respondents believed that the presence or absence of treatment was of importance.

One thing we do improve upon over Lazarin is the issue of "expressivity". We basically capture a rough approximation of this with the never/rarely/often/always classifications.

bschilder commented 2 months ago

Mapping our metrics onto Tiers is a bit challenging since they're quite different: (from Table 1 on Lazarin 2014)

Here's my closest approximation. Notable issues:

Our metric "congenital onset" doesn't directly map onto any of the Lazarin criterion. Something can be congenital and not necessarily cause death at an early age. That said, it's still an important feature and thus perhaps worth considering adding to our tier assignments.
Our metric "physical malformation" maps onto multiple Lazarin criteria; internal physical malformation (Tier 2), and dysmorphic features (Tier 3). We currently can't distinguish between these two situations, where internal malformations are more likely to be severe since they affect organ systems. Assigning our "physical malformation" onto Tier 2 only for now.
```
tiers_dict <- list(
## Tier 1
death=1, 
intellectual_disability=1,
# congenital_onset=1,
## Tier 2
impaired_mobility=2, 
physical_malformations=2,
## Tier 3
blindness=3,  
sensory_impairments=3,
immunodeficiency=3, 
cancer=3, 
## Tier 4
reduced_fertility=4
)
```

NathanSkene commented 2 months ago

Good spot, hadn’t noted that flow chart before.

Makes sense to use their system (maybe with ranking within e.g. profound, based on how many other tier 1 or 2 there are,

I agree with this:

tiers_dict <- list(

Tier 1

death=1,

intellectual_disability=1,

Tier 2

impaired_mobility=2,

physical_malformations=2,

Tier 3

blindness=3,

sensory_impairments=3,

immunodeficiency=3,

cancer=3,

Tier 4

reduced_fertility=4

)

From: Brian M. Schilder @.> Date: Tuesday, 14 May 2024 at 11:29 To: neurogenomics/gpt_hpo_annotations @.> Cc: Skene, Nathan G @.>, Mention @.> Subject: Re: [neurogenomics/gpt_hpo_annotations] Assign categorical Lazarin 2014 Tiers (Issue #4) This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Mapping our metrics onto Tiers is a bit challenging since they're quite different: (from Table 1 on Lazarin 2014) image.png (view on web)https://github.com/neurogenomics/gpt_hpo_annotations/assets/34280215/e93dafb7-7e9d-4f8b-9aae-9011d6711312

Here's my closest approximation. Notable issues:

Our metric "congenital onset" doesn't directly map onto any of the Lazarin criterion. Something can be congenital and not necessarily cause death at an early age.
Our metric "physical malformation" maps onto multiple Lazarin criteria; internal physical malformation (Tier 2), and dysmorphic features (Tier 3). We currently can't distinguish between these two situations, where internal malformations are more likely to be severe since they affect organ systems. Assigning our "physical malformation" onto Tier 2 only for now.

tiers_dict <- list(

Tier 1

death=1,

intellectual_disability=1,

Tier 2

impaired_mobility=2,

physical_malformations=2,

Tier 3

blindness=3,

sensory_impairments=3,

immunodeficiency=3,

cancer=3,

Tier 4

reduced_fertility=4

)

— Reply to this email directly, view it on GitHubhttps://github.com/neurogenomics/gpt_hpo_annotations/issues/4#issuecomment-2109844457, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE2CQLVLMRSSWSRCQEDZCHRSBAVCNFSM6AAAAABGJPYC62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBZHA2DINBVG4. You are receiving this because you were mentioned.Message ID: @.***>

bschilder commented 2 months ago

Severity class can be Mild, Moderate, Severe, or Profound. I've also generated a severity class score, which is just the proportion of metrics that meet our threshold of often/always. This provides a way to rank phenotypes within each severity class as well.

res_coded <- HPOExplorer::gpt_annot_codify()

map_severity_class <- function(r,
                               tiers_dict = list(
                                ## Tier 1
                                death=1, 
                                intellectual_disability=1,
                                # congenital_onset=1,
                                ## Tier 2
                                impaired_mobility=2, 
                                physical_malformations=2,
                                ## Tier 3
                                blindness=3,  
                                sensory_impairments=3,
                                immunodeficiency=3, 
                                cancer=3, 
                                ## Tier 4
                                reduced_fertility=4
                               ),
                               inclusion_values=c(2,3), # i.e. often, always
                               return_score=FALSE){
  tiers <- unique(unlist(tiers_dict))
  tier_scores <- lapply(stats::setNames(tiers,paste0("tier",tiers)),
                        function(x){
    tx <- tiers_dict[unname(unlist(tiers_dict)==x)]
    counts <- r[,sapply(.SD, function(v){v %in% inclusion_values}), 
               .SDcols = names(tx)]
    list(
      counts=counts,
      proportion=sum(counts)/length(tx)
    )
  })
  mean_proportion <- sapply(tier_scores, function(x)x$proportion)|>mean()
  assigned_class <- if(sum(tier_scores$tier1$counts)>1){
    c("profound"=mean_proportion)
  } else if (sum(tier_scores$tier1$counts)>0 ||
             sum(c(tier_scores$tier2$counts,tier_scores$tier3$counts))>3){
    c("severe"=mean_proportion)
  } else if(sum(tier_scores$tier3$counts)>0){
    c("moderate"=mean_proportion)
  } else{
    c("mild"=mean_proportion)
  }  
  if(return_score){
    return(assigned_class)
  } else{
    return(names(assigned_class))
  }
}

res_coded$annot_coded[,severity_class:=map_severity_class(.SD), by=.I]
res_coded$annot_coded[,severity_class_score:=map_severity_class(.SD, return_score = TRUE), by=.I]

I checked that there's a correspondence between our severity scores and the severity classes assigned in this way, and indeed there is:

bschilder commented 2 months ago

Now described in Results and Methods under new section " Severity classes".

Added the violin plot to the supp as well.

neurogenomics / gpt_hpo_annotations