Large file in memory - Githubissues

ohdsi-studies / PioneerWatchfulWaiting

This study is part of the joint PIONEER - EHDEN - OHDSI studyathon in March 2021, and aims to advance understanding of clinical management and outcomes of watchful waiting in prostate cancer.

Apache License 2.0

7 stars 18 forks source link

Large file in memory #98

Open bdemeulder opened 2 years ago

bdemeulder commented 2 years ago

As per Martijn Schuemie's comment on shiny deploy: https://github.com/OHDSI/ShinyDeploy/issues/148#issuecomment-996619402

'Keeping 11GB in memory is not ok. Either use the file system or a database to hold the data, and only load what you're displaying, when you're displaying it.'

My understanding is that we are using the file system to hold the data (by uploading the preMerge to the S3 server), am I wrong?

What else could we do to reduce the memory usage?

schuemie commented 2 years ago

The premerge file is a single file with all data in it. When you load it, you load it all (into memory).

The solution implemented in the CohortDiagnostics Shiny app is to use a database backend instead of the premerge file when data become too large. Here's the code one can use to upload the data. There's an OHDSI Postgres server that the Shiny app can access.

If your Shiny app is custom developed, then another, more lightweight options might be to split up the data in some way (e.g. by cohort ID) into separate data files, and load the data on the fly as needed. An example of this pattern can be found here.

leeevans commented 2 years ago

@bdemeulder In one week, the OHDSI shiny server infrastructure will be reduced in size back to the former configuration (to reduce costs). That means the PioneerWatchfulWaiting app will no longer run successfully with the current usage of 11 Gb of RAM.

Please rewrite the PioneerWatchfulWaiting app to follow the recommendation from @schuemie so it can continue to be hosted successfully on data.ohdsi.org.