Closed woodsaj closed 4 years ago
Ahh, the actual problem is that the query to fetch the probe_sessions is ordering by "updated" https://github.com/raintank/worldping-api/blob/8509d9f6f7e4bef279adfe6d0c6944aae5a7d8ca/pkg/services/sqlstore/probe.go#L639
This results in a race condition eg. probeA -> sends heartbeat and 'updated' is set to 1 probeA starts refresh, active sessions are returned as [probeA, probeB] probeB -> sends heartbeat and 'updated' is set to 10 probeB starts refresh, active sessions are returned as [probeB, probeA]
the fix here is to update the query to order by 'id' instead
AMS probes recently had an issue where two connected probes were running the same checks instead of each being allocated a different batch.
The likely cause of this is the DB returning the list of probe_sessions in a different order. Typically results are ordered by row primary key, but looks like that didnt happen. To be sure, we should always sort the list of probe_sessions ourselves.
https://github.com/raintank/worldping-api/blob/master/pkg/api/sockets/probe.go#L161