Closed bwehrle closed 5 years ago
@bwehrle Thanks for reporting. A quick look at your test indicates that you are creating multiple Order
s all with the same id. That would seem to have a big part in this. Also, I am not certain what Completes<Void>
is meant to accomplish, because the protocol could be either void
or Completes<OrderInfo>
. If the latter you could use the apply(..., andThen)
parameter as follows, and your client would see the correct value:
apply(OrderEvents.Created.with(state.orderId, userId, quantityByProduct), () -> state.createOrderInfo());
Hi @VaughnVernon , The Void was just a way to let the client wait on the application of the event or to know when it was done. It is (or should be) superfluous. The OrderResource class is using the following code to generate the Id of the order.
I printed out the Ids and they are all distinct. The UserId is the same; that addition generates a new State when the event OrderEvents.Created is applied. The new State takes the existing id, and includes the UserId and the order items. I don't see the issue you are describing.
I agree I could do as you say (returning the state as a Completable
final Address orderAddress = addressFactory.uniquePrefixedWith("order-");
...
Order orderActor = stage.actorFor(
Order.class,
Definition.has(OrderEntity.class,
Definition.parameters(orderAddress.idString())),
orderAddress);
@bwehrle Thanks for your patience. I do see a failure very reliably at every fifth test run (!?!), but I don't think the cause is what you expected me to see. It's really strange that it doesn't exhibit the failure on any other run. That's just crazy :)
Where I see the consistent failure is in the task count assertion:
Assert.assertEquals(TASK_COUNT, fList.stream().filter(OrderResourceShould::getOrHandleWithFalse).count());
This seems like a very unreliable assertion, asking the JDK's ExecutorService
to acknowledge an absolute count of running tasks at the time your loop finishes. This is testing that your loop ran 10 times, but it is safe to assume that it did. The assertion failure you are seeing may be masking this actual cause.
One possibility is that your ExecutorService
assertion on count()
could literally be succeeding, but the ordering of execution that the CPU chooses could prevent your thread from seeing true
because your thread's view of that data has not yet been refreshed (flushed) from CPU cache to main memory. I have seen this problem exhibited a lot in test code when there is not a fence around the state in question. Your thread should not be accessing the state until it has a lock
on the data that you and the other threads agree on. This can be accomplished with a synchronized (lock) { access resource }
around the data in questions, but everyone must use it. For example, make your getOrHandleWithFalse()
increment a count that is surrounded by a mutual lock (as above) and your collection of data and then your assertion on that data must be ordered around that. Otherwise you are going to see inconsistent CPU-caused memory updates like this in many tests.
Note that this is not the case when using Completes<T>
because the value from the one thread is being written into memory AND delivered directly to a registered Completes<T>
function, such as andThen(value)
. In other words, the andThen(value)
is now running on the thread that delivered the completed value. In vlingo-actors
this is one of the CompletesEventually
actors that are available in the pool.
As a followup I worked on your test code for a while and did made changes that will help you to see the potential for concurrent data access among multiple threads that won't see the same data in the sequence that appears to be reliable. It's mind boggling until your understand that modern CPUs execute instructions out of order from the actual code. It's part of CPU optimizations, which doesn't make multi-threading any easier.
So in your future tests try what I have said and demonstrated in your code. I am almost certain your problem will go away, as they did with my modifications.
Hi @VaughnVernon , What is weird for me is that I could reproduce this problem using a single thread when running my test from time to time. The single threaded case was very basic - so I made it run more frequently to get more executions and hopefully reproduce the issue more frequently.
My test seems to have a classic case of unsynchronized data access. Thanks for pointing out the issue!
Even with one test thread there is at least one other thread at play to deliver the current message to the Actor
. The access of data touched by both threads must be synchronized or enough time must pass for the reading thread to see updates in main memory. Again, these two statements can happen out of order:
atomicValue.set(value);
until.happened();
Per the Java threading model, Java running on modern Intel cores the until.happened()
can clear the CountDownLatch
before the atomicValue
is actually set. You may read the above sequence of code statements and say "there is no way that the until
can clear before the atomicValue
is set" and you'd be wrong :) That's why you need the following:
synchronized (lock) {
atomicValue.set(value);
until.happened();
}
and:
synchronized (lock) {
return atomicValue.get();
}
Of course the atomicValue
being set inside is already volatile
but the problem is sequence of execution, not data on the bus not reaching main memory yet. The atomicValue
will absolutely be set in main memory as soon as the set instruction is executed. It's just that you can't predict when it will be executed, either before or after until.happened()
, and you need the synchronized (lock)
around both threads' access to the atomicValue
to force Java to ensure that the effects of out-of-order execution aren't seen by reading thread.
See the following. In fact the Wikipedia article is clearer on the subject of sequence of execution.
https://dzone.com/articles/memory-barriersfences https://en.wikipedia.org/wiki/Java_memory_model
"The major caveat of this is that as-if-serial semantics do not prevent different threads from having different views of the data."
HTH.
@bwehrle I want to mark this closed. There are two improvements: (1) Completes<T>
has been fixed to reactive to errors with latent error handler registry inside BasicCompletes<T>
. (2) The addition of AccessSafely
supports thread-safe access of in-test state. It would help if you could confirm this.
I have confirmed fixes and since @bwehrle has not yet confirmed I am closing.
Hi @VaughnVernon, I re-ran these tests inside my example using 0.8.3 and could no longer repro the issue. Thanks for fixing this issue!
I have a OrderEntity that when constructed represents an order without any items. It then has an apply(CreatedEvent) which loads the items. Afterwards, the client checks to see what's in the order. In some cases, the client query sees the entity with an empty order, which according to how stowage should work, should not be the case.
Repro:
This same test cases, when run from IntelliJ fails almost every time. It is checking that each created order has the products that were put in with the created event when run in multiple threads. I was able to make this fail in a single thread as well but it happened less frequently or only from IntelliJ.
I have: