raintank / worldping-api

Worldping Backend Service
Other
25 stars 18 forks source link

ensure list of probe_sessions is ordered when assigning checks #96

Closed woodsaj closed 4 years ago

woodsaj commented 4 years ago

AMS probes recently had an issue where two connected probes were running the same checks instead of each being allocated a different batch.

The likely cause of this is the DB returning the list of probe_sessions in a different order. Typically results are ordered by row primary key, but looks like that didnt happen. To be sure, we should always sort the list of probe_sessions ourselves.

https://github.com/raintank/worldping-api/blob/master/pkg/api/sockets/probe.go#L161

woodsaj commented 4 years ago

Ahh, the actual problem is that the query to fetch the probe_sessions is ordering by "updated" https://github.com/raintank/worldping-api/blob/8509d9f6f7e4bef279adfe6d0c6944aae5a7d8ca/pkg/services/sqlstore/probe.go#L639

This results in a race condition eg. probeA -> sends heartbeat and 'updated' is set to 1 probeA starts refresh, active sessions are returned as [probeA, probeB] probeB -> sends heartbeat and 'updated' is set to 10 probeB starts refresh, active sessions are returned as [probeB, probeA]

the fix here is to update the query to order by 'id' instead