overthesun / simoc-sam

Live backend for SAM at Biosphere 2
2 stars 1 forks source link

Cassandra Sam #29

Open IMBCIT opened 2 years ago

IMBCIT commented 2 years ago

The most important files to ensure the database works are:

.env
cassandra_connection.py
cassandra_models.py
cassandra_tables.py

This uses the Models to .create() entries in the database and does not require CQL queries as seen in cassandra_queries.py to need to be imported and called. These other files do not have to be implicitly committed as their functionality can be integrated into existing workflows such as logging, config parsing, and data insertion. Although since these are contained inside the cassandra_database folder keeping them for historical and testing purposes would be preferred.

There is the possibility of securing the database with authentication but currently it is using the defaults as it should be housed locally and have no outside access.

The cass_install.sh currently support Debian based distros and can target ARM on 32-bit and 64-bit OS, as well as a more traditional x86_64 arch. This will allow us to easily install Cassandra on all the arch we are targeting and can be easily expanded into more based on the OS in question.

One point of concern is stated in the README.md that pertains to configuring the node once the database is installed. After we have the main seed nodes up and running we can easily add new nodes in but will have to set some values that are detailed in the doc to ensure that it can be added to the cluster and being the replication process correctly. We can come up with a base cassandra.yaml and cassandra-rackdc.properties files and either write some shell scripts to ensure that the data is populated as needed or look into Ansible for deploying and configuring Raspberry Pis over SSH or locally. This could also be somewhat alleviated using Docker to config these files for us and run Cassandra in a container but running locally would be preferred to ensure maximum I/O performance.

granawkins commented 2 years ago

A thought for discussion:

Do we want to use a consistent data structure across SAM and SIMOC? In the SIMOC simulation, chart data is stored in (basically) data.agent_name.flows.currency_name. Does that mean sensor data should be in data.sensorid.flows.currency_name? If yes, should the database reflect this structure?

IMBCIT commented 2 years ago

I did not do much investigation into the specifics of the data for SIMOC but it would make sense to store it in a similar structure that is being used in production. We can discuss this further on the call or offline to best represent the data uniformly.

I will take some time this week to dig into the chart data to come up with a better structure.

kstaats commented 2 years ago

Yes. Let's discuss.

On 7/3/22 16:58, Grant wrote:

A thought for discussion:

Do we want to use a consistent data structure across SAM and SIMOC? In the SIMOC simulation, chart data is stored in (basically) data.agent_name.flows.currency_name. Does that mean sensor data should be in data.sensorid.flows.currency_name? If yes, should the database reflect this structure?