swimos / swim

Full stack application platform for building stateful microservices, streaming APIs, and real-time UIs
https://www.swimos.org
Apache License 2.0
489 stars 39 forks source link

Join Value Lane didUpdate/didReceive firing multiple times #16

Open SirCipher opened 4 years ago

SirCipher commented 4 years ago

The JoinValueLaneSpec#testLinkToJoinValueLane test in swim-system-java is currently failing inconsistently due to an issue with the didUpdate/didReceive phase being dispatched multiple times. After an unpredictable number of test runs, the test fails due to the latches not counting down enough times.

Output snippet when the test is passing:

join link didUpdate key: "x"; newValue: "x0"; oldValue: ""
join link didReceive body: @update(key:x) x0
join link didUpdate key: "y"; newValue: "y0"; oldValue: ""
join link didReceive body: @update(key:y) y0

join link didUpdate key: "x"; newValue: "x0"; oldValue: ""
join link didReceive body: @update(key:x) x0
join link didUpdate key: "y"; newValue: "y0"; oldValue: ""
join link didReceive body: @update(key:y) y0

When it is failing:

join link didUpdate key: "y"; newValue: "y0"; oldValue: ""
join link didReceive body: @update(key:y) y0
join link didUpdate key: "x"; newValue: "x0"; oldValue: ""
join link didReceive body: @update(key:x) x0

This issue does not present itself every time the test is run. To reproduce it, run the tests back-to-back.

horned-sphere commented 4 years ago

I've taken a look at the behaviour of this and it is very confusing.

My initial suspicion was that we were not observing all of the expected events as the sets were ocurring before the downlinks had synced. I put in a some extra latches to ensure that all of the downlinks were synced before the rest of the test completes. This makes it get past the wait for the update count much more reliably (though oddly not 100% so it might just be incidental).

However, even when it receives the number of updates that it expectes, the test sometimes fails as either "x" or "y' maps to an empty string rather than the expected "x0" or "y0" despite the correct number of updates being seen.

I'm going to look at it some more but it is making me scratch my head so far.

SirCipher commented 4 years ago

I'm not sure if this is same behaviour is also present in the JoinMapLanes also. Merging the client and plane on the join-map-lanes branch appears to result in the same issue.

horned-sphere commented 4 years ago

Having written a hacked version of that test that records hat it sees I have produced a number of traces of pathalogical behaviour. A few representative samples are below (the numbers are System.nanoTime):

[128146031846595 y: => y0] [128119712331393 y: => null, 128119714465462 y: => null, 128119715042245 y: => y0] [128148055109348 y: => null, 128148056272128 y: => null, 128148057802323 y: => null, 128148058802988 y: => y0] [128170262028364 x: => null, 128170262701504 x: => null]

The issue seems to be that the initial value is received any number of times rather than exactly once. Therefore, waiting for exactly to updates isn't adequate.

The test could be fixed by awaiting an update with the final expected value rather than just having the latch. However, I'm not sure if this is something we should expect to have to take into account.

SirCipher commented 4 years ago

Following a discussion with @horned-sphere, I have disabled this unit test in dae5f56e495fc533d62987b12fc7d42d15d33b71

SirCipher commented 4 years ago

This same behaviour is observed on the Join Map Lane, and Map Lanes. This causes the latch assertions for the phases to be unreliable. For instance, these assertions pass:

assertEquals(joinDidReceive.getCount(), 0);
assertEquals(joinDidUpdate.getCount(), 0);

But after a number of invocations of a unit test this then fails:

assertEquals(join.size(), 4);

But the previous assertions have passed due to the phases being fired multiple times.