phenoscape / rphenoscape

R package to make phenotypic traits from the Phenoscape Knowledgebase available from within R.
https://rphenoscape.phenoscape.org/
Other
5 stars 5 forks source link

Allow supplying opposite qualities for use in mutual exclusion #240

Closed johnbradley closed 1 year ago

johnbradley commented 2 years ago

Enhance the mutual exclusion functionality to allow supplying opposite qualities. This data should be used in determining the exclusivity type (strong_compatibility, weak_compatibility, inconclusive_evidence, weak_exclusivity, strong_exclusivity).

This will build on the work done for #237

johnbradley commented 2 years ago

For example femur decreased length and femur elongated are mutually exclusive. Using the current code these two are determined to haveweak_exclusivity:

femur <- get_phenotypes(entity="femur")
femur_decreased_length <- femur[femur$label == "femur decreased length", 'id']
femur_elongated <- femur[femur$label == "femur elongated", 'id']
mutually_exclusive(c(femur_elongated, femur_decreased_length) , progress_bar = FALSE)$dataframe$mutual_exclusivity
[1] weak_exclusivity

If we know the qualities "decreased length" and "elongated" are opposites this should return strong_exclusivity for the two phenotypes (and any other phenotype pairs "X decreased length" and "X elongated", where X is an anatomical element, e.g., a Uberon term).

hlapp commented 2 years ago

@wdahdul @pmabee @uyedaj I am tagging you here so you can review and comment.

hlapp commented 2 years ago

@johnbradley I slightly edited the last sentence in your example.

johnbradley commented 2 years ago

To supply the opposite qualities to the mutually_exclusive() function how about a data frame with two columns (quality.a and quality.b)? This way a user could pass in a list of opposite qualities.

As a simple example elongated is opposite of decreased length so we could create a dataframe like so:

elongated_iri <- "http://purl.obolibrary.org/obo/PATO_0001154"
decreased_length_iri <- "http://purl.obolibrary.org/obo/PATO_0000574"
quality_opposites <- data.frame(
    quality.a = c(elongated_iri),
    quality.b = c(decreased_length_iri)
)

This data frame would be passed to mutually_exclusive() like so:

> result <- mutually_exclusive(phenotypes_to_compare, quality_opposites = quality_opposites)
> result$dataframe$mutual_exclusivity
[1] strong_exclusivity
5 Levels: strong_compatibility < weak_compatibility < inconclusive_evidence < ... < strong_exclusivity

If you had additional opposite qualities you could just include more rows to the quality_opposites data frame.

quality_opposites <- data.frame(
    quality.a = c(elongated_iri,        chronic_iri, aerobic_iri),
    quality.b = c(decreased_length_iri, acute_iri,   anaerobic_iri)
)
hlapp commented 2 years ago

Yes, I agree in general. More specifically, we should only require that these two columns be present, not, for example, that they be the only columns. (If someone were to maintain such a table by hand, they would likely want to include labels as additional columns so they can more easily remember what's in the table. We shouldn't force them to massage the table each time before using it as input here. In the same vein, we should auto-trim the IRIs to remove trailing spaces, because if someone keeps this in Excel, extraneous trailing or leading spaces will creep in sooner or later and appear in the CSV export.)