m-lab / traceroute-caller

A sidecar service which runs traceroute after a connection closes
Apache License 2.0
18 stars 5 forks source link

Use the shx package instead of the pipe package #107

Closed SaiedKazemi closed 3 years ago

SaiedKazemi commented 3 years ago

The pipe package (gopkg.in/m-lab/pipe.v3) doesn't work as expected because not only it doesn't support contexts its RunTimeout() method does not correctly timeout. As a result, the latency numbers cannot be trusted.

This commit replaces the pipe package with the shx package in the ScamperDaemon.trace(). Once we have validated the new code works as expected, the Scamper.trace() will also use the shx package.

Because there are multiple traces being performed in parallel, the new code improves log messages by including the context identifier so individual contexts can be easily identified.

Tested the changes locally using docker-compose.


This change is Reviewable

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 258


Changes Missing Coverage Covered Lines Changed/Added Lines %
tracer/scamper.go 68 71 95.77%
<!-- Total: 74 77 96.1% -->
Files with Coverage Reduction New Missed Lines %
tracer/scamper.go 4 96.26%
<!-- Total: 4 -->
Totals Coverage Status
Change from base Build 233: -0.8%
Covered Lines: 586
Relevant Lines: 593

💛 - Coveralls
SaiedKazemi commented 3 years ago

tracer/scamper.go, line 229 at r1 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
This doesn't need to be declared here. Local within the case statements is fine.

Yeah, I did the updates differently initially but then moved it inside the case statements. Removed the declaration.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 219 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
Add "-w", "2" which seems to make a big difference.

It's actually "-W" (capital W) which is specifies in 1/100ths of seconds how long to wait between probes. I've added support to specify this via a flag.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 219 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
We'll try dropping the -O ptr.

I've added a flag to include/exclude "-O ptr" in tracelb command.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 219 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
We should also test in sandbox with 1h timeout, 10 minute IPCacheTimeout, and 1 minute IPCache...Period. If these work well, then we should merge and deploy to staging.

I modified docker-compose.yaml to use the above values.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 219 at r3 (raw file):

Previously, SaiedKazemi (Saied Kazemi) wrote…
It's actually "-W" (capital W) which is specifies in 1/100ths of seconds how long to wait between probes. I've added support to specify this via a flag.

Done.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 219 at r3 (raw file):

Previously, SaiedKazemi (Saied Kazemi) wrote…
I've added a flag to include/exclude "-O ptr" in tracelb command.

Done.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 219 at r3 (raw file):

Previously, SaiedKazemi (Saied Kazemi) wrote…
I modified docker-compose.yaml to use the above values.

Done.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 233 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
I'm trying this in gfr-params, and will push it to host pods to test.

Thank you @stephen-soltesz for your suggestion. I have simplified the code.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 233 at r3 (raw file):

Previously, SaiedKazemi (Saied Kazemi) wrote…
Thank you @stephen-soltesz for your suggestion. I have simplified the code.

Done.

SaiedKazemi commented 3 years ago

caller.go, line 55 at r4 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
Reverse this to scamperWithPTR, and default to true. Otherwise double negatives = confusing

Done.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 116 at r4 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
Please choose one or the other of these, to reduce spam. I prefer the second one, as it is more helpful for debugging.

I kept the first one ("Starting a trace ...") for backward compatibility, but I will merge them into one and make it consistent with other messages (i.e., Trace {started,timed out,failed,succeeded} in context...).

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 132 at r4 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
nit: do not mix `fmt` and `log` output.

That was an oversight. Thanks for catching it.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 134 at r4 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
Suggest you delete this log, to reduce spam.

This short message helps parsers to easily determine status of traces explicitly as opposed to implicitly assuming success in the absence of any error messages.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 230 at r4 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
I was just about to write this as a comment. 8-)

Will do this in the next round of code clean ups and improvements.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 270 at r4 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…
Is this what you want?

By "this" I assume you mean `return "", err'. As the comment says, test code expects null output when a trace times out. I will change the test code to accept partial output because we want to save whatever output we had before the trace timed out.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 285 at r4 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
Previously, non-nil errors would not write files. Now there are non-timeout errors that will still write the content of buff to a file. This is new behavior. Is this intended?

Yes, the new behavior is intended. If a traceroute times out on, say, hop number 12, the data that we have for the first 11 hops is still useful. That said, I need to verify that scamper does provide partial output when a traceroute times out.

SaiedKazemi commented 3 years ago

tracer/scamper_test.go, line 243 at r4 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…
Prefer `errors.Is()` over string comparisons when possible.

Done.

SaiedKazemi commented 3 years ago

caller.go, line 55 at r4 (raw file):

Previously, SaiedKazemi (Saied Kazemi) wrote…
Done.

Done.

SaiedKazemi commented 3 years ago

tracer/scamper.go, line 285 at r4 (raw file):

Previously, SaiedKazemi (Saied Kazemi) wrote…
Yes, the new behavior is intended. If a traceroute times out on, say, hop number 12, the data that we have for the first 11 hops is still useful. That said, I need to verify that scamper does provide partial output when a traceroute times out.

Done.

SaiedKazemi commented 3 years ago

@gfr10598, please merge if you and @stephen-soltesz are OK. Obviously, we will continue to iteratively make further improvements.