Open ahgraber opened 2 years ago
I have the same issue. As a workaround, I randomly sampled a set of users from the population.
@ahgraber @Alex-Bujorianu The problem with personalization()
, besides the performance complexity, is that it uses quadratic space.
I had issues with 50k users with performance time only here. I resolved the space problem by increasing the swap from default 2 GB to 40 GB (I am using Ubuntu). Hope that helps.
@gregwchase I wouldn't label this issue as a bug, as there is no way to bypass the space complexity here. I left a recommendation on how to increase memory space for those who use Linux machines.
On my system (16gb ram), a list of 10k recommendations will run. A list of 50k will crash out. I'd like to try to understand the personalization score across my entire hypothetical customer base 250k+.
Is there a way to chunk the scipy.sparse.csr_matrix and iteratively calculate the cosine similarity to avoid holding the whole thing in memory?