krassowski / complex-upset

A library for creating complex UpSet plots with ggplot2 geoms
MIT License
469 stars 28 forks source link

Changing the y-axis scale for intersection_size from ComplexUpset package #189

Open arunimgarg opened 1 year ago

arunimgarg commented 1 year ago

Objective I want to normalize the intersection_size data from 0 to 1. I have created 4 different upset plots using the ComplexUpset package in R. The 4 plots have different intersection sizes since the lengths of the data frames range from 300 to 12000. I was hoping to have a same y-axis scale for ease of clarity and discussion.

I have attached 2 out of the 4 upset plots I have created that I need to compare (redacted the labels since the I'm working on a project on a vm of a protected institution). As it can be seen, the y-axes of the plots are on different scales.

After reading the Upset and ComplexUpset documentations, I see that the intersections are internally calculated and cannot really be extracted. I see that you still manipulate the intersections like:

'Intersection size'=intersection_size(text_mapping=aes(label=paste0(round(
            !!get_size_mode('exclusive_intersection')/!!get_size_mode('inclusive_union') * 100
        ), '%')))

but I couldn't do a normalization like

'Intersection size'=intersection_size(text_mapping=aes(label=paste0(round(
            !!get_size_mode('exclusive_intersection')/max(!!get_size_mode('inclusive_union')))

I saw How to to assign logarithmic scale to “Intersection size” using ComplexUpset library? solution from @krassowski and I'm hoping to do something similar using the geom_bar to maybe normalize instead of a log scale.

Screenshot or illustration

image image

Context (required)

ComplexUpset version: 1.3.3

arunimgarg commented 1 year ago

I have done the following to normalize (y = y/max(y))the intersection size:

presence = ComplexUpset:::get_mode_presence('exclusive_intersection')
summarise_values = function(df){
    aggregate(
        as.formula(paste0(presence, '~intersection')),
        df,
        FUN = sun
    )
}

upset(
    movies,
    genres,
    base_annotations=list(
        'log10(intersection size)'=(
            ggplot()
            + geom_bar(
                data=summarise_values,
                stat='identity',
                aes(y=!!presence / max(!!presence))) 
            )
        )
    ),
    width_ratio=0.1
)

I think the results make sense as I'm seeing them, but if anyone sees any logical mistake, let me know. Otherwise, we're good to close this issue.