waggle-sensor / beehive-server

Waggle cloud software for aggregation, storage and analysis of sensor data from Waggle nodes.
13 stars 17 forks source link

Unify data model #53

Open seanshahkarami opened 6 years ago

seanshahkarami commented 6 years ago

I've been thinking some more about how we can combine a bit of what we have now into a simpler pipeline. A very reasonable approach would be to transition to a cassandra table with columns: ((nodeid , date), topic, timestamp), body

(Yes, topic is kind of just a semantic change from plugin. It'd basically be used to store any routing key, for example, coresense:3 or metric.).

Now, we could put a single "data" exchange in beehive accepting all messages like this. If it's a direct exchange, we can then do a simple "opt-in" for each topic we want to store in that database.

The other nice thing about this layout is it supports splitting messages by topic from the database. Generally, you always end up having to handle each topic case-wise, so having the database support this would be great. At the moment, we can't do that without manually filtering. This should also allow better time slicing within a single day.

This is also general enough that we don't need any special code handling things at the front - we just grab data, maybe add a received_at timestamp, and shove it in the database from later processing. This eliminates the need to do any data format changing since all that has to be handled on a case-by-case basis anyway but ensures the storage (and backup) problem is handled uniformly.

Another way to think of this is as simply as a permanent message log which can be replayed for later processing. The nice thing is, this can be designed as a configurable service in the sense that a binding for each topic can be adding to any exchange and you'll automatically start getting backups.

seanshahkarami commented 6 years ago

I setup up a mock-up of this on beehive. On a test node, I have metrics being sent this way and having them stored in this table. I think this solves the "common intake" problem. Since all data is likely to be handled on a case-by-case basis anyway, I don't think there's anything to do for this step other than transition the other plugin data into the location too.