Closed minrk closed 3 days ago
After discussing with @yuvipanda and @0mar, we are going to go with postgres on Amazon RDS for this. Since this can be encrypted at rest, we should not need the secondary encryption between the client and the database that the default sqlite storage delegate requires. @surfdoc @jpp9 can either of you confirm that?
@minrk It is probably a good idea to put a meeting together to discuss the new architecture and design for the MVP before you begin your work. It will be quite a bit different than the pre-MVP. Can you send me your availability next week and I will set up a meeting.
I am available at our weekly meeting on Wednesday. As much as possible, I'd like to do design discussions in writing here or elsewhere so async folks can participate. Is that possible? I think there may be room to discuss this at this Friday's meeting, too.
No problem. I think it would be good to have a deeper technical discussion into the architecture and data flow so I am working on setting that up for next week's software meeting. We can record it and make it available here and then we can post the docs here as well to maintain the async communication. We can definitely walk through at a high level during Friday's meeting as well.
Sounds great! Right now we're just setting up infrastructure we need to explore things, so not making any serious design decisions yet. Hopefully we'll be better prepared to have those conversations soon.
Closing this since the MVP architecture won't require us to operate a KV store, and we don't need to scale up the pre-MVP architecture in the meantime.
The CHCS storage delegate is currently backed by AWS SecretManager. While this works and was quick to get off the ground with nothing to deploy, this isn't really appropriate for a read/write key-value store. For a variety of reasons, but not least of which is that the whole secret must be written at once.
We should move to a 'real' key-value store, but that requires picking and deploying one.
The key-value store is responsible for storing sensitive information, specifically:
Since we are on AWS, probably the simplest to deploy is DynamoDB. But if we want JupyterHealth to be portable, we should probably include deploying a generic KV store. Logical choices:
deploy simple kv store in helm (etcd, redis/valkey, etc.)
relational database like postgres
s3
Advantage of simple kv store: lighter, quicker, most portable
Advantage of relational database: ~identical to default SQLite storage delegate (but have to use a real database, since we can't mount sqlite for all users). Disadvantage: probably overkill to deploy given the task of simple key:value pairs Advantage of s3: nothing to deploy, better persistence/resilience/access controls appropriate for the data. Disadvantage: portable across all major cloud providers, but not as portable as a database or kv server