piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.55k stars 4.37k forks source link

Readme incorrectly states that HDP implementation is parallel #2384

Closed ckingdev closed 5 years ago

ckingdev commented 5 years ago

The readme currently (on master) states:

Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

However, the HDP model is single core (and it looks like RP is as well, though I'm less familiar with the technique) and this is misleading.

piskvorky commented 5 years ago

Most of these algos rely on numpy matrix operations, which in turn rely on BLAS, which is typically multi-threaded. So even if there's no explicit parallelization logic, you should see an improvement on multi-core machines. It's definitely the case with RP, but I'm not sure about HDP, you may be right.

In the future, it's better to ask the mailing list and discuss there.

ckingdev commented 5 years ago

I'm well aware that BLAS is used here. You and I both know that that's not what people are going to read "multicore HDP" as. Especially since the package does have a parallel implementation of LDA.

piskvorky commented 5 years ago

@ckingdev are you a user of HDP? I'm actually considering dropping that model from Gensim altogether.

Some words on where/how HDP is useful to you, a workflow from an actual user, could help us make that decision. Cheers!