knossos.competition checker is blocked by linear/start-analysis

ttyusupov commented 6 years ago

As I understand the idea of knossos.competition checker is to run linear and wgl checkers in parallel. But actually linear/start-analysis is blocking calling thread on knossos.model.memo/memo function call, which seems to be doing almost the whole analysis. As a result mem-watch and reporter in knossos.search/run are also not working, because the linear/start-analysis needs to complete before mem-watch and reporter started.

aphyr commented 6 years ago

memo doesn't do any analysis and is supposed to be relatively cheap. Is there a chance you have a non-degenerate state space here? I haven't written this yet because it's not something I've needed so far, but if memo encounters a large enough state space, it should abort and return the original model instead--it only exists to optimize degenerate searches.

ttyusupov commented 6 years ago

I've added some debug logs into knossos and the most time is spent inside memo/fixed-point:

14:27:29.030 [main] INFO  jepsen.core - Analyzing...
14:27:29.044 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - memo start
14:27:29.048 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - fixed-point start
14:27:29.050 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  1
14:27:29.054 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  30
14:27:29.066 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  438
14:27:29.221 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  4196
14:27:30.061 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  26918
14:27:36.325 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  116085
14:27:59.650 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  339321
14:29:13.445 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  681965
14:31:57.589 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  977970
14:35:57.012 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  1097059
14:40:33.580 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - expand-model transitions:  31  models:  1108763
14:44:32.187 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - fixed-point done
14:48:26.264 [clojure-agent-send-off-pool-4] INFO  knossos.model.memo - memo done
14:48:26.271 [Thread-12] INFO  knossos.model.memo - memo start
14:48:26.273 [Thread-12] INFO  knossos.model.memo - fixed-point start
14:48:26.273 [Thread-12] INFO  knossos.model.memo - expand-model transitions:  31  models:  1
14:48:26.273 [Thread-12] INFO  knossos.model.memo - expand-model transitions:  31  models:  30
14:48:26.276 [Thread-12] INFO  knossos.model.memo - expand-model transitions:  31  models:  438
14:48:26.280 [knossos reporter] INFO  knossos.search - running reporter
14:48:26.406 [Thread-12] INFO  knossos.model.memo - expand-model transitions:  31  models:  4196
14:48:26.844 [main] INFO  jepsen.core - Analysis complete

Thread-12 is WGL analysis from competition search which is only started after memo in linear/start-analysis is completed.

The model is multi register and history contains 51 successful ops without any failures/infos (except nemeses infos). I am using ops of two types - "read the whole register" and "write the same random value (0-9) to two random keys (0-9)".

Also looks like memo/models function builds a set of reachable models using only one CPU core and applying every transition to every model and does not take into account transition time and also whether transition actually can modify the model (I understand knossos doesn't know about whether operation can modify the state, but it can be introduced).

ttyusupov commented 6 years ago

BTW, does it makes sense to build model wrapper in case canonical-history returns the same history as original? Also, if size of original and canonical histories is the same - will there be any benefit to use canonical instead?

aphyr commented 6 years ago

Okay sooooo to answer my earlier question, you've got a state space of roughly 10^10. Memo is meant for degenerate state spaces of roughly 10^2. You may find the docstrings at the top of the file helpful.As I mentioned earlier, the appropriate fix is to add bounds to memo so that it aborts if states * transitions becomes larger than some threshold, and return the original model and history instead. And a little escape hatch for unwrapping non-wrapped models.--KyleOn Jul 19, 2018 07:31, Timur Yusupov notifications@github.com wrote:BTW, does it makes sense to build model wrapper in case canonical-history returns the same history as original? Also, if size of original and canonical histories is the same - will there be any benefit to use canonical instead?

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.

ttyusupov commented 6 years ago

For the purpose of our testing, I've just added a one-minute timeout to memo for now in our knossos fork. But as I see memo can be optimized to only try to apply transitions which correspond to non-failed operations. Also can linear checker launch memo inside analysis thread like WGL is doing?

aphyr commented 6 years ago

In addition to the patches in 0.3.3 which aborts memoization for large state spaces, yes, we may be able to pass on testing failed operations--I'm not entiiiiirely sure about that, but it sounds reasonable!

jepsen-io / knossos

knossos.competition checker is blocked by linear/start-analysis #25