Closed seanshahkarami closed 6 years ago
Since I was a little concerned about making sure we're able to do this for backups ASAP, I went ahead and added two tools to:
https://github.com/waggle-sensor/beehive-server/tree/master/data-exporter
export
exports all the datasets from a specific node
exportall
exports all the datasets
These export datasets to CSV files in data/*node_id*/*date*.csv
.
(These will end up as one tool...I just wrote export as a quick prototype.)
That was faster than I expected! Doing a full export this way took about 15 minutes. We just need a good place to keep the data. It's just under 5 GB uncompressed, so space isn't really an issue.
As another data point, exporting all of the new Panasonic node's data took about 7 seconds.
I think the simplest way to do this without having to build and significantly change any other layers on beehive is to exposed Cassandra locally within beehive and add an "exporter" role who can only do a select on specified data tables. (I think the last part is important even to just prevent us from making a mistake. You don't want an exporter to accidentally destroy a table!)
This would allow us to write a couple special purpose tools with good performance to do things like bulk backups and exports.
This could even be scheduled to periodically batch, compress and store the data on a mass data store like S3 daily.