koheiw / seededlda

LDA for semisupervised topic modeling
https://koheiw.github.io/seededlda/
73 stars 15 forks source link

Setting options(seededlda_threads = 1) gives reproducible results, otherwise not. #76

Closed myliserta closed 4 months ago

myliserta commented 4 months ago

The documentation needs to clarify how to set a seed for each sub-processes when multi-threading.

Reference for this problem: https://stackoverflow.com/questions/78248050/set-seed-in-quantedas-lda-function

koheiw commented 4 months ago

I updated MAN recently. Is this enough?

https://github.com/koheiw/seededlda/blob/004af95da0ad325e74a373270fe9c74269191976/R/lda.R#L36-L40

myliserta commented 4 months ago

This should be enough, thank you. Alternatively, would it be possible to have a set.seed() work in parallel algorithms, or is it too dependent on the Operating System?

koheiw commented 4 months ago

I don't think I can. If I fix the order of parallel processioning, its performance gain will disappear.

myliserta commented 4 months ago

I perfectly understand this. Thank you for addressing my coment.