rarebreed / khadga

Web application to gather real time video and text data over WebRTC and websockets
Other
3 stars 0 forks source link

Replace mongodb with cloud managed postgresql #31

Open rarebreed opened 4 years ago

rarebreed commented 4 years ago

Was doing a bit of research and found out that mongodb has a SSPL license (Server Side Public License). This means that if you use their product in the cloud, your code has to be AGPL licensed. While I would like my code to be open sourced, I also want others to have the freedom to do what they like with it.

As a result, I'm going to use another database. Some other concerns I have about mongodb are that you are limited to 16MB documents, and mongodb has a relatively infamous security problem (perhaps somewhat unfairly deserved).

Rethinkdb also has two features that mongodb has that are missing that I would really like to use:

Reactive databases are the way to go. Instead of the client polling for new data, the client can ask to be told when data changes. This helps cut down on clients working with stale data. Also, for some of the things I have in mind, time series data will be very useful. For example, video images coming from a webcam for object tracking must be ordered and the time differential should be known in order to gauge the speed.

I dont know how fast rethinkdb writes the data (it's not a specialized time series database like influxdb), and it's only got millisecond accuracy, but I think that should be sufficient for me.

rarebreed commented 4 years ago

Been looking at google's firestore, and it looks like it might be doable. They give a certain amount free per month, and my needs will be very modest.

The biggest challenge is that they do not have a native Rust client. Two possibilities come to mind:

rarebreed commented 4 years ago

Also need to remove the Deployment, Service and PVC files for mongodb

rarebreed commented 4 years ago

I now have a new service called mimir running on GKE that runs on nodejs. The reason for this was 2-fold:

In theory, I could have done both in rust, but the libraries make this easier. For the latter part, we will use the nodejs firestore library (which is thankfully natively written in typescript) to do our CRUD operations on our database.

rarebreed commented 4 years ago

I'm also going to add a postgresql database for long term lookup. Firestore will only be used for real time syncing of data. For storing of user information, this will be done in postgresql. We will also include blog posts and chat messages here.

rarebreed commented 4 years ago

I think I'm going to remove firestore totally. I think that with a combination of GraphQL + postgres, I can get "reactive" updates too. Although postgres itself isn't reactive, graphql can be (thanks to subscriptions). Since all CRUD operations have to go through the graphql server, if one client updates some state/record, then we can notify any other subscribers that the data has changed.

I'm still torn on whether or not to pay for Google's managed SQL or to do it myself and bundle my own postgres server. I think the cost is roughly the same in the long run (even if I include it myself, I still have to have it run on a vcpu even if its shared). Plus, I dont have to worry about security updates, locking it down, etc. I do have to worry about setting up private IPs, but I would have had to create a new ClusterIP if I rolled my own. Plus, I dont need to use kubernetes secrets or anything like that. It probably is a bit cheaper to do it myself, but at least with the amount of traffic the site will be getting, the difference will be very small

rarebreed commented 4 years ago

I've been playing around with Diesel for sql support, but I think this might be overkill for now. For one, something's not quite right with Diesel when I run diesel migrations run. If I change entries in my up.sql, the right schema is not always generated. It also doesn't always delete the tables.

Also, support for dropping down into raw sql seems odd. There's a sql_query function, and only after a bit of searching did I realize that it could be used for inserts/updates/deletes as well as selects. Worse, there's almost zero documentation and examples on how to use raw sql.

I realize that diesel is an ORM, and is supposed to help you write typesafe queries, but I also think that ORM's magically hide too much detail, and should give you more freedom to write raw queries.

As such, I think I'm going to drop Diesel. It's supposedly a fair bit faster than rust_postgres, but I think the main strength of an ORM is:

I don't need the first point, and the latter point I think is debatable.