One should only need to do this after starting engines:
import pyhsmm.parallel # creates client and stuff
# either this
for data in datas:
model.add_data_parallel(data)
# or this, which could be greedy or smart
model.add_datas_parallel(datas)
for i in progprint_xrange(1000):
model.resample_model_parallel()
Each sequence should be sent to only one engine (load balancing based on greedy sequence length balancing and assuming all engines are about the same speed), and each engine will only resample on its assigned data. If data added to the model with add_data already exists on some engine (checked via hash), don't broadcast it; if it doesn't exist on the engines, send it to one.
There should be another dynamic load balancing mode which does the current thing: load all data on each engine and dispatch resampling tasks dynamically. Maybe something like this:
model.broadcast_data_parallel(data)
for i in progprint_xrange(1000):
model.resample_model_parallel_lbv()
One should only need to do this after starting engines:
Each sequence should be sent to only one engine (load balancing based on greedy sequence length balancing and assuming all engines are about the same speed), and each engine will only resample on its assigned data. If data added to the model with add_data already exists on some engine (checked via hash), don't broadcast it; if it doesn't exist on the engines, send it to one.
There should be another dynamic load balancing mode which does the current thing: load all data on each engine and dispatch resampling tasks dynamically. Maybe something like this: