Unify data model - Githubissues

I've been thinking some more about how we can combine a bit of what we have now into a simpler pipeline. A very reasonable approach would be to transition to a cassandra table with columns: ((nodeid , date), topic, timestamp), body

(Yes, topic is kind of just a semantic change from plugin. It'd basically be used to store any routing key, for example, coresense:3 or metric.).

Now, we could put a single "data" exchange in beehive accepting all messages like this. If it's a direct exchange, we can then do a simple "opt-in" for each topic we want to store in that database.

The other nice thing about this layout is it supports splitting messages by topic from the database. Generally, you always end up having to handle each topic case-wise, so having the database support this would be great. At the moment, we can't do that without manually filtering. This should also allow better time slicing within a single day.

This is also general enough that we don't need any special code handling things at the front - we just grab data, maybe add a received_at timestamp, and shove it in the database from later processing. This eliminates the need to do any data format changing since all that has to be handled on a case-by-case basis anyway but ensures the storage (and backup) problem is handled uniformly.

Another way to think of this is as simply as a permanent message log which can be replayed for later processing. The nice thing is, this can be designed as a configurable service in the sense that a binding for each topic can be adding to any exchange and you'll automatically start getting backups.

waggle-sensor / beehive-server

Unify data model #53