poypoyan / edhsmm

An(other) implementation of Explicit Duration Hidden Semi-Markov Models in Python 3
MIT License
8 stars 3 forks source link

no safe multi-process/threads #14

Closed Tianxu-Jia closed 1 year ago

Tianxu-Jia commented 1 year ago

I use your library with joblib. It shows that it is not parallel safety.

poypoyan commented 1 year ago

Hello. Apologies for a late response.

Honestly, I am not familiar with joblib and I am not doing machine learning right now. But parallelism would be a nice addition here, so I'll fix this.

Can you provide the exact error message? If you want, can you also provide a code snippet/sample (with data removed of course) on how you use my library with joblib?

Thanks!

poypoyan commented 1 year ago

@Tianxu-Jia Good day!

Finally had time for this. I think that only fit has the problem, because only fit modifies model parameters. For now, we can do something like:

from joblib import Parallel, delayed

models = [M0, M1, M2]
data = [data0, data1, data2]

Parallel(n_jobs=-1, require='sharedmem')(delayed(i.fit)(j) for i, j in zip(models, data))

The "sharedmem" allows mutation/editing of objects in the main program. See here: https://joblib.readthedocs.io/en/latest/parallel.html#shared-memory-semantics


Work is ongoing so that we can also do something like this (if one does not prefer sharedmem for some reason):

from joblib import Parallel, delayed

models = [M0, M1, M2]
data = [data0, data1, data2]

[M0, M1, M2] = Parallel(n_jobs=-1)(delayed(i.fit)(j) for i, j in zip(models, data))

And because of these changes, I'll also introduce model "names".

R1 = GaussianHSMM(n_states = 3, n_durations = 4, name = "Model 1")   # new parameter "name"

Names are showed in printed messages. This is helpful when models are run in parallel.

FIT: reestimation complete for loop 3.
FIT (Model 1): converged at loop 3.
FIT: reestimation complete for loop 4.

These changes will be released in the next version 0.2.2.

Thanks!

poypoyan commented 1 year ago

Update: 0.2.2 is now released which includes the features I presented above.

I think I can now close this issue, but feel free to re-open it if you have questions.