Parallelism in SEMPR

Intro

As discussed before, we want some kind of parallelism in sempr. The big questions are:

What do we want to compute in parallel?
How do we achieve that? What does a strategy to implement it look like, what compromises do we have to make?
- How to keep our data consistent?
- How to make everything thread-safe?
- How to avoid/detect/resolve deadlocks?

I will and can not answer everything here. What I can do is propose one idea, one strategy, and discuss its drawbacks, compromises, potential problems, and ask further questions you all may think about between two bites at lunch. :wink:

What to parallelize?

The obvious parts of sempr to run in parallel are the different processing modules. Especially when working with geometries those may take some time to get processed. There are two sub-tasks to consider here: Processing events and answering queries. Lets look at events first: Given one arbitrary event, not every module is triggered by it. Some simply ignore it while others may start some heavy computing. That said, we need more than just parallelism. We need the modules to work in parallel and asynchronous: Every module has its own event-queue, and some are faster processing it than others.

How to parallelize? Ideas, problems, compromises.

Lets assume every processing module has its own thread and event-queue. It grabs events from the queue and processes them as fast as it can.

What if one processing module should process an event before another module?

Don't use the same event. If you have a clear dependency, let the first module emit different event that the second module listens for.

So much for the easy part.

Our events are stupid.

Yes they are. The default set of events (EntityEvent<T>) has a type flag that tells us if the entity it refers to was created, loaded, removed or changed -- and it keeps a direct pointer to the entity. We do not compute/store differences, there is no "Person-Changed-Name"-event, just "Person-Changed" with a pointer to the entity.

Note: If you want delta-updates/events, feel free to propose a strategy to implement them. I'd even be happy about a history with old copies of entities or something like that. But I don't think that is something easy to implement or practical to use. Not for the general use-case that sempr is intended for. So I've to work with what I got right now. :wink:

One consequence is the following: If foo->changed() is called twice, there will be two events fired, but both of them will point to the same object, in the same state, regardless of the changes that were made in-between. So what to do? Triggering the computation twice seems a bit wasteful, but actually we don't have a real choice. Any other event fired between those two changed-events may result in a different processing.

The question is: Can/should we prevent that? Should we prevent an entity from being modified while it sits in an event-queue, make it read-only for that time? My heart says "yes", but my head explodes at the thought of the severe deadlock-potential introduced by this. So my first guess is: Don't. Just let it happen. We have other problems.

At least there should be locks on the entities themselves, right? Reader-Writer-locks? With usually only the one who creates an entity being the writer, and most (all) of the modules being readers, there shouldn't be much of a problem. Just make sure to release the locks when e.g. entering the waiting-for-a-query-to-be-answered-mode, to avoid deadlocks there.

What about queries? When to answer them? And how?

One main part of sempr is (or rather, shall be) the capability of forward- and backward-reasoning/processing. While the events represent the forward-part (process as soon as something happens), the queries resemble the backward-reasoning (only process if we get ask for it). Also, it would be nice if anyone could ask queries at any time: Aside from the user, processing modules may ask the core to answer some query while processing an event or another query. We want this connection between the different processes, combine semantic, geometric and domain specific reasoning. But when introducing parallel execution of modules, this gets really tricky.

The main idea

... I'd like to introduce is the following:

Whenever a processing module asks a query, the modules that may answer it must have a state of knowledge that is at least as new as the state of the querying module.

In other words: If module A is currently processing event 7 and asks a query that module B may be able to answer [1], B must be working on event 7 or higher. This is a compromise: We ensure that the answering modules have already seen the events that led to the query, but at the same time allow results that are too new. Of course, this could in some cases lead to unexpected behaviour. But the alternative would be to force the modules to be at the exact same point in the event-queue -- we'd be working in lockstep again, losing quite much of the parallelism.

This brings us to the question: How to ask a query, and what to do in the mean time? Since a module may have to catch up with the event processing first before answering a query I propose to insert queries into the event-queues, too. Using priority-queues this should be fairly easy: Give the query a priority that matches the state of the event-queue of the querying module and it will be processed if the answering module has already surpassed that point, or it will catch up first. I guess that something like std::condition_variable will be useful to implement the signalling between the threads.

BEWARE THE DEADLOCKS

If the answering module has to catch up, it may ask a query while processing an event. There are two ways of dealing with this:

Make sure that there are no loops. The query must not be handed to the first module again, or else we have a deadlock, as it is currently waiting for a query to be answered, too.
Loops may be useful in some cases. Imagine a semantic reasoner that asks for triples before inferring something. A geometric processing module might want to add something to it, but first needs to check some semantic stuff... -- boom. If we enable a module to answer queries while waiting for a query to be answered itself, we should be fairly safe.

I'd really like to have a model of dependencies, but even if I got one I cannot tell how to work with it. What should I do? Detect possible loops and throw an error, even if all might go well? I think method 2. is the one I'll go for... for now.

The last part was mostly written with the mindset of queries to process events, but another thing to think about is queries to answer queries: If A asks a query and B tries to answer, B must be ahead of A (or at the same state). If B now asks a query itself in order to answer As query, we could be in trouble: If A is a candidate to answer it (well, why not? "To answer your question, please first tell me....") there could be a deadlock. But: Following the rule stated above, as the original question was issued by A, there is no need for a to be at the state of B to answer a question of B that is only used to answer As question (I hope I haven't lost you there :smile:). It is sufficient that anyone who attempts to answer it is at the same (or higher) state as the original questioner.

Summary

Parallel, asynchronous execution of modules
Insert queries into the queues depending on the queue-state of the original questioner
Accept that modules that answer my question may be ahead of me and thus include information I haven't seen myself (as a module)
Be happy that modules which want to answer a question I (a module) ask first catch up to my state of processing events

Comment

Yes, this is complex. And whenever something is as complex as this I ask myself: "What am I missing, can this be done any simpler, easier?" -- But at least this is a first approach that does a bit more than just "let it go and hope for the best". :wink: And if/when I start implementing this, one of the first things will be a visualization of the event-queues and the query-waiting-states.... :sweat_smile:

As always, feel free to comment, criticize, propose different approaches, .... :slightly_smiling_face: Any input to this topic is welcome!

[1] What I left out at this point: Somewhere, somehow we need to model which modules want to process which queries. Else every query will make a module wait for all other modules to catch up, even for the slow-poke-freaky-special-stuff that doesn't answer anything but just, idk, publishes RViz-markers or something like that, and is completely uncritical for the further processing pipeline. Also it would be nice to model the event input- and output and do some optimizations on that.

sempr-tk / sempr

Parallelism. #18