sustainable-computing-io / susql-operator

a Kubernetes operator that aggregates energy and CO2 emission data for tagged resources
http://susql.org
Apache License 2.0
9 stars 1 forks source link

Double counting on controller restart #3

Open tardieu opened 1 year ago

tardieu commented 1 year ago

The controller keeps track of how much energy per pod has already been accounted for in main memory. This table is lost on restart of the controller meaning total energy consumption will be incorrectly doubled on restart when the label group is not setup to reset counts on restart.

trent-s commented 7 months ago

I am seeing this also. Even worse, occasionally values go down after restart.

mamy-CS commented 6 months ago

Brainstorm dump:

  1. Store energy aggregation data in a Persistent Storage Solution
  2. Modify the Controller Logic: When the controller reconciles pods and updates their energy consumption, also update the corresponding records in the persistent storage. When the controller starts, retrieve the energy consumption data from the persistent storage and use it to initialize the in-memory counts.
  3. Handle Controller Initialization: During controller startup, check if there is existing data in the persistent storage. If data exists, load it into the controller's memory. If no data exists (e.g., first run), initialize the memory counts to zero.
  4. Error Handling and Data Consistency: Implement error handling mechanisms to handle failures during storage operations. Ensure data consistency between in-memory counts and persistent storage. For example, periodically synchronize in-memory counts with the persistent storage to handle cases where the controller crashes or is restarted unexpectedly.
  5. Testing: Thoroughly test the implementation to ensure correctness and reliability. Test scenarios such as controller restarts, pod terminations, and data consistency checks.
trent-s commented 6 months ago

Excellent!