Open shiraeisenberg opened 7 months ago
Thank you so much for raising the issue @shiraeisenberg. I will have a look into this as soon as possible.
@shiraeisenberg Would you mind giving me the value of your denstream._n_samples_seen
? Usually, from my personal experience, DenStream
would not return any clusters for approximately 100 first observations. As such, we would usually use 100-1000 first samples as a burn-in, i.e let the model learn without actually requiring any predictions.
@hoanganhngo610 I see this too.
I took the example in the documentation, and re-ran that.
If we do not add something like denstream.predict_one({0: -1, 1: -2})
, then n_clusters is always 0.
Dear @nipunagarwala. Thank you very much for your response.
If you look closely into the source code of DenStream
and the example within the documentation, you can notice that the learn_one
only creates p-micro-clusters and o-micro-clusters, but only when predict_one
is called for the first time, the final clusters will be generated and the number of clusters for the solution will be known.
As such, in the example within the documentation, if you retrieve the number of o_micro_clusters
and p_micro_clusters
by the following commands, the results will be 0 and 2 respectively:
>>> len(denstream.o_micro_clusters
0
>>> len(denstream.p_micro_clusters
2
This philosophy is adopted since we believe that the cluster solutions, or final cluster centers, should only be generated if the command is called since it's extremely computationally expensive; else,only o-micro-clusters and p-micro-clusters will be updated.
Hope that this answer is clear and helpful to you.
Versions
river version: installed from github Python version: 3.11 Operating system: Mac OS
Describe the bug
With openai embeddings, Denstream is returning 0 clusters regardless of the set parameters.
Steps/code to reproduce