project-codeflare / appwrapper

AppWrapper controller for Kueue
https://project-codeflare.github.io/appwrapper/
Apache License 2.0
5 stars 7 forks source link

Misleading node name in the "Updated lending limits" log message #253

Open tardieu opened 5 days ago

tardieu commented 5 days ago

The "Updated lending limits" log message is emitted upon successful etcd update of the slack queue. If this update fails due to an update conflict, we repeat the update when reconciling the next queued node event which is often about another node. As a result, the node name embedded in the "updated lending limits" message is misleading suggesting the limit was updated because of the specific node whereas there is no such relationship.

We need to log the cause of lending limit changes in a way that identifies the responsible node. The "Updated node health information" log message only accounts for autopilot label changes.

tardieu commented 5 days ago

We should also include before and after lending limit values (or deltas) in these messages as opposed to only the final state.