rafguns / textual-coherence

Clustering coherence by Jensen-Shannon divergence
0 stars 0 forks source link

Stop recalculating doc_probs in probabilities #2

Open rafguns opened 3 years ago

rafguns commented 3 years ago

If I see it correctly, we could do it only once in jsd_samples.

rafguns commented 3 years ago

I don't see it correctly, unfortunately... Closing.

rafguns commented 3 years ago

Reinvestigate! Seems like this should be possible.

rafguns commented 3 years ago

Specifically, we should be able to do

doc_probs = all_docs / all_docs.sum(axis=1, keepdims=True)

The fiddly bit then is that we need to draw the corresponding sample from all_docs and doc_probs.

rafguns commented 3 years ago

Another possible optimization. Rather than calling rng.choice many times, the size parameter can be used to generate them all in one go: https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html#numpy.random.Generator.choice