inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

block clustering: memory usage grows over time #56

Closed glouppe closed 9 years ago

glouppe commented 9 years ago

Memory usage seems to grow more than it should over time. We need to understand why -- the only thing that should grow is the number of base clusterers, but I dont see why it would take that much space.

glouppe commented 9 years ago

Given how different memory usage behaves for n_jobs=1, I suspect that the issue could be coupled with how jobs are created/destroyed in joblib/multiprocessing. Memory is very stable in that case...

glouppe commented 9 years ago

After some investigation, I would say this comes from the Python 2 implementation of multiprocessing. The author disambiguation works very nicely with Python 3 and n_jobs > 1. No suspicious memory increase like in Python 2.