va-big-data-genomics / data-release-portal

Source for web portal describing data releases published by the Stanford Data-as-a-Service (DaaS) team.
MIT License
0 stars 0 forks source link

Design data release portal (website) #1

Open pbilling opened 1 year ago

pbilling commented 1 year ago

Overview

I want to create a website for publishing summary information about all our data releases. I want to have live statistics and figures that update as data or code changes. I also want to share the results and methods.

pbilling commented 1 year ago

Plan A

Reasoning for doing this is:

  1. I want the site to update when code is updated. In this case, any change to the code in GitHub will trigger a new Cloud Build... build.
  2. I want the site to update when data is updated. With Cloud Run instances running within my Google environment I can pull images/data/etc. from cloud storage without exposing any project specific information on GitHub. I could also have it rerun queries or analyses if needed because it will have project credentials.
  3. I don't want to put any credentials or project information on GitHub.
  4. I want interactive data exploration and figures which means Quarto and Plotly as opposed to PNGs that I could just link from a web page.
pbilling commented 1 year ago

Plan B

pbilling commented 1 year ago

How to serve latest data + code?

I want to run the app in GCP so I can load the latest data to generate figures and metrics and with Cloud Build, each user will see a fresh server, that pulls the latest data. So that is solved regardless of the website technology.

How to serve interactive visualizations?

I think everything is going to be rendered when the app instance is launched, so I don't need Shiny or anything. This blog post indicates you need to use Shiny to serve dynamic content, but I'm not sure what that means. Regardless, I don't think we need it.

So then, the value of using Quarto is you can see the actual code/queries that are being used to generate each result. I could copy-paste code/queries into a web page but then there is no functional connection to the results and so you have to apply significant effort to keep everything in sync.