opensafely-core / r-docker

Docker image for running R scripts in OpenSAFELY
1 stars 3 forks source link

ggsankey #118

Closed rc16 closed 1 year ago

rc16 commented 1 year ago

Hi,

I'd like to use the r library ggsankey. It is installed via the remotes package using the command remotes::install_github("davidsjoberg/ggsankey") I can see remotes is already available on OS, can you add this additional library?

It requires patient level data to make the graphs, which I don't think can be released from the server so I think it would be better to create the graphs on the platform and then export the graphs.

remlapmot commented 1 year ago

Just to comment/warn the team that if you install this package by running devtools::install_github()/remotes::install_github() with its default settings, i.e. simply as

remotes::install_github('davidsjoberg/ggsankey')

then that will update alot of dependency packages in the container.

To install without with updating the dependencies run with the upgrade argument set to 'never' or FALSE

remotes::install_github('davidsjoberg/ggsankey', upgrade = 'never')
wjchulme commented 1 year ago

ggsankey isn't on CRAN, and I don't think we currently support installation of non CRAN packages.

However, you should be able to export sufficiently detailed summary data and then recreate the patient data offline. The original (fairly silly) example in the package vignette is:

df <- mtcars %>%
  make_long(cyl, vs, am, gear, carb)

ggplot(df, aes(x = x, 
               next_x = next_x, 
               node = node, 
               next_node = next_node,
               fill = factor(node))) +
  geom_sankey()

Here df is "patient level" but really it's just a funky representation of a series of cross-tabs, between var1 and var2, var2 and var3, var3 and var4, etc.

So to opensafelify this:


## output check and release this compact dataset from the server
df_compact <- 
  mtcars %>%
  make_long(cyl, vs, am, gear, carb) %>%
  dplyr::group_by(x, node, next_x, next_node) %>%
  dplyr::summarise(n = n())

## restore the original "patient-level" representation on your local machine
df_uncompact <- 
  df_compact %>%
  tidyr::uncount(weights = n)

## plot as normal
ggplot(df_uncompact , aes(x = x, 
                        next_x = next_x, 
                        node = node, 
                        next_node = next_node,
                        fill = factor(node))) +
  geom_sankey()

You still need the make_long function from the ggsankey package to be available inside opensafely. But dependency-wise it's quite simple, so you should just be able to paste it into a script and use it without needing anything else from the package.

inglesp commented 1 year ago

Thanks for looking at this @wjchulme -- does this mean we should close this issue?

(And thanks as ever for your input @remlapmot!)

rc16 commented 1 year ago

Thanks very much @wjchulme, I'll try this.

I'm happy if you would like to close the issue.