Closed Tianxu-Jia closed 1 year ago
Hello. Apologies for a late response.
Honestly, I am not familiar with joblib and I am not doing machine learning right now. But parallelism would be a nice addition here, so I'll fix this.
Can you provide the exact error message? If you want, can you also provide a code snippet/sample (with data removed of course) on how you use my library with joblib?
Thanks!
@Tianxu-Jia Good day!
Finally had time for this. I think that only fit
has the problem, because only fit
modifies model parameters. For now, we can do something like:
from joblib import Parallel, delayed
models = [M0, M1, M2]
data = [data0, data1, data2]
Parallel(n_jobs=-1, require='sharedmem')(delayed(i.fit)(j) for i, j in zip(models, data))
The "sharedmem" allows mutation/editing of objects in the main program. See here: https://joblib.readthedocs.io/en/latest/parallel.html#shared-memory-semantics
Work is ongoing so that we can also do something like this (if one does not prefer sharedmem for some reason):
from joblib import Parallel, delayed
models = [M0, M1, M2]
data = [data0, data1, data2]
[M0, M1, M2] = Parallel(n_jobs=-1)(delayed(i.fit)(j) for i, j in zip(models, data))
And because of these changes, I'll also introduce model "names".
R1 = GaussianHSMM(n_states = 3, n_durations = 4, name = "Model 1") # new parameter "name"
Names are showed in printed messages. This is helpful when models are run in parallel.
FIT: reestimation complete for loop 3.
FIT (Model 1): converged at loop 3.
FIT: reestimation complete for loop 4.
These changes will be released in the next version 0.2.2.
Thanks!
Update: 0.2.2 is now released which includes the features I presented above.
I think I can now close this issue, but feel free to re-open it if you have questions.
I use your library with joblib. It shows that it is not parallel safety.