Go through the master less when resampling (and diversifying and collapsing)

The current architecture of resampling when running multiprocess is

The master already has the particle weights
The master computes which particles will appear in the result how many times
The master retrieves all the particles which will appear at least once and kills all the worker processes
The master copies all the particles which will appear more than once as many times as needed
The master chunks up the resulting particles (in no particular order) and forks new worker processes to handle them

Our current regime is that particles are in general quite large (being full execution traces of the model program) and quite expensive to transmit and copy.

In this regime, this architecture is undesirable, because the master is a bottleneck for doing all the copies (perforce in series), and for the communication necessary to collect the particles.

This architecture is also unnecessary, because the weights are sufficient for the semantic content of resampling. We should probably move to an architecture like this:

The master already has the particle weights
The master computes which particles will appear in the result how many times
The master computes a resampling plan consisting of a set of commands to the workers to
- Drop a particle,
- Copy a particle,
- Transmit a particle to another worker (and thereby copy it)
- Note that making many copies of a single-homed particle can be parallelized by transmitting it to some additional workers and letting the resulting copy set have multiple homes
This plan could be optimized for
- Retaining load balance among the workers, so that each still has the same number of particles to be in charge of
- Minimizing transmission of particles across workers (which requires serialization)
- Minimizing total latency (which is presumably something like the max over workers of the total number of particles sent and received (in-worker copies should be cheaper, but are perhaps not negligible))
The master commands the workers to execute the resampling plan, possibly controlling synchronization between them

Effects:

The master need not ever manipulate anything as large as a particle, or do any operation as expensive as copying or transmitting one
The workers need to be able to cross-communicate without the master, so need to run servers that they can connect to without the master

Possible intermediate state: proceed as above but leave the master to actually perform cross-worker communication, thereby retaining the current structure that the master just holds pipes to all the workers and they need not talk to each other. The intermediate state is probably still an improvement over the status quo.

probcomp / Venturecxx

Go through the master less when resampling (and diversifying and collapsing) #276