statisticianinstilettos / recmetrics

A library of metrics for evaluating recommender systems
MIT License
565 stars 98 forks source link

personalization() has explosive memory requirements due to pairwise comparison #37

Open ahgraber opened 2 years ago

ahgraber commented 2 years ago

On my system (16gb ram), a list of 10k recommendations will run. A list of 50k will crash out. I'd like to try to understand the personalization score across my entire hypothetical customer base 250k+.

Is there a way to chunk the scipy.sparse.csr_matrix and iteratively calculate the cosine similarity to avoid holding the whole thing in memory?

Alex-Bujorianu commented 1 year ago

I have the same issue. As a workaround, I randomly sampled a set of users from the population.

ibuda commented 1 year ago

@ahgraber @Alex-Bujorianu The problem with personalization(), besides the performance complexity, is that it uses quadratic space. I had issues with 50k users with performance time only here. I resolved the space problem by increasing the swap from default 2 GB to 40 GB (I am using Ubuntu). Hope that helps.

ibuda commented 1 year ago

@gregwchase I wouldn't label this issue as a bug, as there is no way to bypass the space complexity here. I left a recommendation on how to increase memory space for those who use Linux machines.