m-lab / traceroute-caller

A sidecar service which runs traceroute after a connection closes
Apache License 2.0
18 stars 5 forks source link

scamper pid leak #81

Open yachang opened 3 years ago

yachang commented 3 years ago

There was a pid leak detected by mlab3.fln01

https://github.com/m-lab/ops-tracker/issues/1204

After investigation of the log, the pid leak was caused by scamper after scamper-daeon failed:

Screen Shot 2020-11-02 at 3 08 51 PM

yachang commented 3 years ago

You can see the green line (scamper daemon died), then scamper was brought up, the pid leak started.

After about 11 hours, the pid leak caused the crash of the evrything.

Before we nail down the pid leak in scamper, we will replace the flag

"scamper-daemon-with-scamper-backup"

with

"scamper-daemon"

k8s-support PR following.

yachang commented 3 years ago

https://github.com/m-lab/k8s-support/pull/509/files

stephen-soltesz commented 3 years ago

The PID count over the same time period as "scamper" was active in the image above.

Screen Shot 2020-11-02 at 4 29 15 PM