Open dsandubb opened 6 years ago
Hi Denisa, I'm moving right now and don't have a lot of bandwidth for OSS stuff, but this souuunds like it's something to do with the way the new queue checkers expand drain operations? It's been several years since I was in this code so I don't recall exactly how that works :(
Oh, you're right. I looked into it for a bit. Looks like the issue comes from this line, as mapcat
would like to map that on an array, to my understanding.
After running the tests with that line changed to [(:value op)]
, I get something like:
:total-queue
{:valid? false,
:lost
#{1425 1391 1405 1441 1414 1406 1408 1435 1396 1393 1394 1402 1410
1452 1431 1400 1403 1384 1386 1423 1389 1407 1412 1420 1421 1387
1418 2976 1397 1382 1446 1415 1449 1434},
[...]
:unexpected #{:exhausted :exhausted :exhausted :exhausted},
[...]}
(:value op)
holds :exhausted
, but I'm not exactly sure whether or not that's the point of the code, because to my understanding, a drain operation should, at some point, lead to the queue being exhausted.
Right... Is there a chance you're trying to upgrade those tests to work with a newer version of Jepsen? The API around queue checking changed a while ago to simplify draining queues, and that would create the kinds of compatibility issues you're finding right now. If I recall, drains used to be implemented by having the client do a bunch of dequeues and directly modifying the test history, adding in new synthetic dequeue operations. It'd then return :exhausted as the value for the dequeue op, which was ignored by the checker. The new checker (I think this is in the docstring maybe?) doesn't make the client do all that work--it can return a list of dequeued elements as the value of the dequeue operation directly.
For an example of how drains work now, see https://github.com/jepsen-io/jepsen/blob/master/hazelcast/src/jepsen/hazelcast.clj#L230-L234. Contrast that with https://github.com/jepsen-io/jepsen/blob/master/rabbitmq/src/jepsen/rabbitmq.clj#L171-L180, which has to generate a sequence of synthetic dequeue invocations & completions, and call conj-op! to insert them into the history directly. If you're upgrading to a recent Jepsen, I think you can cut all that code out and replace it with a drain like hazelcast's test does.
When running the RabbitMQ queue test, the total-queue checker gets an java.lang.IllegalArgumentException.
Using a previous version of the checker (commit eb796cfcc204c592545965968bd28ad1e6b2eff0) works perfectly, however. Could you please look into that? Thanks a lot.