Open jinjingforever opened 8 years ago
I think the issue is that bytes.Buffer is not thread-safe. IIRC Cos used the same bytes.Buffer for both stdout and stderr, which is prone to raciness (I fixed a similar issue in another unit test that was racy last week).
I checked in a fix for this a week or so ago. If stdout/stderr are the same a thread safe wrapper is used. Of course there may be other bugs.. I'll look thus pm and tomorrow. Thanks for finding a local repro!
Cheers, Cos.
On Sat, Dec 5, 2015, 11:13 AM bogdan caprita notifications@github.com wrote:
I think the issue is that bytes.Buffer is not thread-safe. IIRC Cos used the same bytes.Buffer for both stdout and stderr, which is prone to raciness (I fixed a similar issue in another unit test that was racy last week).
— Reply to this email directly or view it on GitHub https://github.com/vanadium/issues/issues/989#issuecomment-162238045.
Here are all the instances I could find under x/devtools where we use the same bytes.Buffer for both stdout and stderr. As you can see, the problem is pervasive:
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=180 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=402 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=875 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=956 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=1264 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=1290 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=1404 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=1425 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=1687 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/go.go&l=180
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/oncall/collect.go&l=594 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/oncall/collect.go&l=633
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/vcloud/vcloud.go&l=302 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/vcloud/vcloud.go&l=310
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/generate_test.go&l=161
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/presubmit/test.go&l=230
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/presubmit/query_test.go&l=296
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/vmon/servicecounters.go&l=66
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/rpc.go&l=239 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/rpc.go&l=369
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/vmon/serviceqps.go&l=87
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/oncall/serve.go&l=130 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/vkiosk/serve.go&l=166
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/internal/golib/go.go&l=176
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/vmon/gceinstance.go&l=155 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/vmon/gceinstance.go&l=228
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/test_test.go&l=39 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/test_test.go&l=66 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/test_test.go&l=93
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-go/go_test.go&l=87
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/cleanup_profiles.go&l=32 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-test/internal/test/cleanup_profiles.go&l=63
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/internal/goutil/util.go&l=39
https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-v23-profile/java/java.go&l=176 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-v23-profile/java/java.go&l=200 https://cs.corp.google.com/#vanadium/release/go/src/v.io/x/devtools/jiri-v23-profile/java/java.go&l=225
There are a couple options to fix this throughout:
I think option 2 is more practical and robust, since even if we fix all current occurrences of bytes.Buffer, new ones may be introduced in the future. But then again, it may be insufficient if there are more than one parallel execution threads using the same buffer in e.g. two different Capture calls that happen in parallel.
On Sat, Dec 5, 2015 at 7:59 PM, bogdan caprita notifications@github.com wrote:
There are a couple options to fix this throughout:
1.
replace all occurrences of bytes.Buffer by a thread-safe alternative, and do this across the board indiscriminately. It's otherwise hard to figure out which are cases where we pass the same to both stdout and stderr. Note that it's not always obvious -- sometimes, stdout and stderr are ostensibly different objects, but they actually wrap the same underlying buffer e.g. different MultiWriters writing to the same bytes.Buffer. 2.
add logic to the chokepoints in the system (e.g. Capture() or the context) to wrap whatever the caller passed using a thread-safe Writer pair (wrap stdout and stderr Writers in our own Writers that are aware of each other and serialize output to the underlying Writers). Again, we need to do this indiscriminately, not just when stderr != stdout since that may still end up writing to the same underlying writer.
I've done 2 for when stdout == stderr, it's easy enough to use the wrapper in all cases, but I don't see how it can help if there is other code that uses the same io.writer directly? Maybe the code I added for when stdout==stderr is buggy. I don't see how wrapping each when stdout != stderr will help there?
I think option 2 is more practical and robust, since even if we fix all current occurrences of bytes.Buffer, new ones may be introduced in the future. But then again, it may be insufficient if there are more than one parallel execution threads using the same buffer in e.g. two different Capture calls that happen in parallel.
- all of the cases I've seen the buffer is used in the immediately surrounding context: eg:
out bytes.Buffer Capture(&out,&out). use out.String()
so I'm not so worried about the concurrent case and I'm pretty sure this is not the cause of our flakes.
Cheers, Cos.
— Reply to this email directly or view it on GitHub https://github.com/vanadium/issues/issues/989#issuecomment-162268988.
There are at least three issues:
var out bytes.Buffer stdout := io.MultiWriter(&out, jirix.Stdout()) stderr := io.MultiWriter(&out, jirix.Stdout()) s.Capture(stdout, stderr).Run....
The stdout == stderr condition in not going to be met here. Hence, we need to always wrap the writers in a way that makes access thread-safe, not only when stdout == stderr. For example, you can have inside initAndDefer,
stdout, stderr = serializeWriters(stdout, stderr)
where serializeWriters could be something like this http://play.golang.org/p/hKOmbyjssb
.3. even with the above fix to (2), there may be cases where we use the same bytes.Buffer accross calls to Run/Last. This could happen when we set up the same buffer to be both stdout and stderr for two different sequences running in parallel. You may be right that this case may not pop up in our current set of flakiness, so it's more of a hypothetical situation. The reason I'm bringing it up is because I saw examples where we set both stdout and stderr in the jiri context to the same bytes.Buffer. Is it possible for this context to then be used for concurrent sequences?
On Sun, Dec 6, 2015 at 9:32 AM, bogdan caprita notifications@github.com wrote:
There are at least three issues:
1.
the code you added in https://vanadium-review.googlesource.com/#/c/17961/9/runutil/sequence.go doesn't actually serialize access to stdout/stderr, since you're creating a different lockedWriter for each (so the two lockedWriters are unaware of each other and hence use independent locks): if stdout == stderr { stdout = &lockedWriter{f: stdout} stderr = &lockedWriter{f: stderr} } what I think you need instead is: if stdout == stderr { stdout = &lockedWriter{f: stdout} stderr = stdout }
do'h!
1.
Even with the above fix to (1), you still need to cover the case where stdout != stderr but both wrap the same underlying bytes.Buffer. E.g., there are a few example where I saw the following construct:
var out bytes.Buffer stdout := io.MultiWriter(&out, jirix.Stdout()) stderr := io.MultiWriter(&out, jirix.Stdout()) s.Capture(stdout, stderr).Run....
The stdout == stderr condition in not going to be met here. Hence, we need to always wrap the writers in a way that makes access thread-safe, not only when stdout == stderr. For example, you can have inside initAndDefer,
stdout, stderr = serializeWriters(stdout, stderr)
where serializeWriters could be something like this http://play.golang.org/p/hKOmbyjssb
yes, I agree. I'll get a CL together shortly. I got caught upgrading to el capitan yesterday...
- even with the above fix to (2), there may be cases where we use the same bytes.Buffer accross calls to Run/Last. This could happen when we set up the same buffer to be both stdout and stderr for two different sequences running in parallel. You may be right that this case may not pop up in our current set of flakiness, so it's more of a hypothetical situation. The reason I'm bringing it up is because I saw examples where we set both stdout and stderr in the jiri context to the same bytes.Buffer. Is it possible for this context to then be used for concurrent sequences?
yes, I agree that it can happen but sequences are not intended to be used concurrently so I feel that's more of an error. The one case that worries me is defers, since there you have to know to call NewSeq, it's no worse than the old code, but it's still annoying. Similarly, the idiom
if err:=s.WriteFile(); err != nil { }
compiles, but doesn't work as expected since err is initialized to a *Sequence, but I feel this is a more a 'design of go' consequence. Still I'd love to find a cleaner solution.
Cheers, Cos.
— Reply to this email directly or view it on GitHub https://github.com/vanadium/issues/issues/989#issuecomment-162330772.
ok, CL out. I'll TBR if it passes presubmits. I think the MultiWriter use must be on the code path that only gets executed on jenkins which is why the presubmits/go race etc don't find the race.
Cheers, Cos.
On Sun, Dec 6, 2015 at 9:41 AM, Cosmos Nicolaou cnicolaou@google.com wrote:
On Sun, Dec 6, 2015 at 9:32 AM, bogdan caprita notifications@github.com wrote:
There are at least three issues:
1.
the code you added in https://vanadium-review.googlesource.com/#/c/17961/9/runutil/sequence.go doesn't actually serialize access to stdout/stderr, since you're creating a different lockedWriter for each (so the two lockedWriters are unaware of each other and hence use independent locks): if stdout == stderr { stdout = &lockedWriter{f: stdout} stderr = &lockedWriter{f: stderr} } what I think you need instead is: if stdout == stderr { stdout = &lockedWriter{f: stdout} stderr = stdout }
do'h!
1.
Even with the above fix to (1), you still need to cover the case where stdout != stderr but both wrap the same underlying bytes.Buffer. E.g., there are a few example where I saw the following construct:
var out bytes.Buffer stdout := io.MultiWriter(&out, jirix.Stdout()) stderr := io.MultiWriter(&out, jirix.Stdout()) s.Capture(stdout, stderr).Run....
The stdout == stderr condition in not going to be met here. Hence, we need to always wrap the writers in a way that makes access thread-safe, not only when stdout == stderr. For example, you can have inside initAndDefer,
stdout, stderr = serializeWriters(stdout, stderr)
where serializeWriters could be something like this http://play.golang.org/p/hKOmbyjssb
yes, I agree. I'll get a CL together shortly. I got caught upgrading to el capitan yesterday...
- even with the above fix to (2), there may be cases where we use the same bytes.Buffer accross calls to Run/Last. This could happen when we set up the same buffer to be both stdout and stderr for two different sequences running in parallel. You may be right that this case may not pop up in our current set of flakiness, so it's more of a hypothetical situation. The reason I'm bringing it up is because I saw examples where we set both stdout and stderr in the jiri context to the same bytes.Buffer. Is it possible for this context to then be used for concurrent sequences?
yes, I agree that it can happen but sequences are not intended to be used concurrently so I feel that's more of an error. The one case that worries me is defers, since there you have to know to call NewSeq, it's no worse than the old code, but it's still annoying. Similarly, the idiom
if err:=s.WriteFile(); err != nil { }
compiles, but doesn't work as expected since err is initialized to a *Sequence, but I feel this is a more a 'design of go' consequence. Still I'd love to find a cleaner solution.
Cheers, Cos.
— Reply to this email directly or view it on GitHub https://github.com/vanadium/issues/issues/989#issuecomment-162330772.
I investigated a little bit more about #985. I ran the following command multiple times locally:
And the v.io/x/ref/lib/security/securityflag test failed about 1 in 10 runs, and the outputs of the test captured by this call are also different from run to run. For example:
Pass 1
Pass 2
(The stderr text is "---- Shell Cleanup ----\n---- Shell Cleanup Complete ----". The rest is stdout. Notice that their order is random in these two cases. BTW the order here is the correct one.)
Fail 1
(The "--- PASS" line is missing, which makes go2xunit think the test failed)
Fail 2
(The "--- PASS" and "PASS" line are both missing)
Based on my observation above, the "initAndDefer" method in runutil/sequence.go might be buggy when stdout and stderr are the same. Assigning to Cos for now.
Thanks!
@bjornick