waggle-sensor / beehive-server

Waggle cloud software for aggregation, storage and analysis of sensor data from Waggle nodes.
13 stars 17 forks source link

Review ETL processes for sanitization, robustness and correctness #45

Open seanshahkarami opened 6 years ago

seanshahkarami commented 6 years ago

We should do a simple review of the main processes involved in loading data into the databases, processing it, etc. Some examples of what we're looking for are things like:

  1. Do they apply sanitization? For example, ensure consistent node_ids, encoding, naming, etc.
  2. Do they handle invalid data correctly? At least one process just drops bad blobs on failure. We probably would like to flag that data and have it put into an error queue or something for later inspection.
  3. Are they tolerant to database and broker delays, timeouts, etc? This means things like not crashing immediately if the database is busy, ensuring proper message acknowledgements are being done, etc.
  4. Are they relatively efficient in their implementation?

This is worth looking at and getting correct now, as these will be part of our architecture regardless of how we redesign beehive.

seanshahkarami commented 6 years ago

All the workers now have proper connection retries when starting, so that should cut down on the crashing immediately and restarting if the message broker is down.