Prototype static version of beehive dataset interface

seanshahkarami commented 6 years ago

One possible improvement we can do is build a static version of beehive which is regenerated on a schedule. This would dramatically improve page serving performance across the board and give us some room to add sanitization to the datasets until we've cleaned inconsistencies up.

This also has the side effect of completely eliminating direct database access for datasets to the outside work and so could eliminate any security mistakes which show up. (Even though, this really shouldn't be a problem...)

I think this is still worth prototyping, even though we now have nginx performing caching and have moved off the development server. As an example, the build-index tool in the data-exporter generates a "friendly" summary of all the datasets to make sure things look reasonable.

seanshahkarami commented 6 years ago

We may also want to consider compressing the data as part of the process to further reduce space and data serving size.

seanshahkarami commented 6 years ago

Among other things, we may want to add an auxiliary table to Cassandra tracking the last time a dataset was updated. This could make syncing and rebuilds much more efficient.

seanshahkarami commented 6 years ago

I had a chance to finish a prototype after work this evening. I think I'm pretty happy with the performance and think it's worth moving this forward if everyone's onboard.

waggle-sensor / beehive-server

Prototype static version of beehive dataset interface #41