Database architecture - Githubissues

ezio-melotti commented 2 years ago

This issue covers the architecture of the database, including the database type, the technologies we want to use, and how the data are sent to the DB.

I think there are at least two possible ways to implement the DB:

The DB could be implemented as a separate socketio client;
The DB could be integrated in the server.

If the DB is implemented as a separate socketio client, it can request the data to the socketio server in the same way the frontend does. The socketio server could have a specific event listener for the DB, that provides data in a format that is more convenient for the DB. Whenever the socketio server receives data from the sensors, it will send them to the DB via socketio, and the DB will write them. The server should also be able to request data from the DB client whenever needed (e.g. when another client requests them). With this architecture is also easy to connect multiple DBs, possibly on different machines and using different DB technologies, and easily setup redundancy and backups.

If the DB is integrated in the server, the server will automatically write the data it gets from the sensors in the DB. It will also be able to directly retrieve the data from the DB. Even though this guarantees better performance, it's also less flexible than the first approach, and creates a tight coupling between the socketio server and the DB.

I'd suggest to experiment with the first approach first, and see if it works reasonably well. If not, we can explore other solutions.

ezio-melotti commented 2 years ago

FWIW python-socketio supports different message queues.

ezio-melotti commented 2 years ago

[I found this old message that I forgot to submit...]

It seems that Cassandra can handle this on its own. Each Raspi can read data from the sensor and write them into a local Cassandra node that is already connected to the other DB nodes. Cassandra should then be able to synchronize the data among all nodes.

In order to save storage on the Raspis, it might be better to just store the data from the connected sensors, possibly for a limited amount of time (e.g. 1 week or month). If this is possible, we would need two nodes on separate machines that have the full history, and satellite nodes on the Raspis that store partial data.

IMBCIT commented 2 years ago

Due to how the distributed model works doing it this way would be sort of an anti pattern but could technically be done depending on the amount of nodes in the cluster.

Although if we want to have that master database approach we can either have "multiple servers" that function as the nodes and the raspberry pi just send the data to the cluster and we will not have to worry about management of data on them.

Or final option is to drop Cassandra and adopt a more traditional DB model which shouldn't be too much work technically as the schema that would be required would be rather small.

overthesun / simoc-sam

Database architecture #8