"I'm using bounter to count the frequency of items in a large set. I was periodically pickling the bounter object. Doing this causes the memory to continually increase" (based on https://groups.google.com/forum/#!topic/gensim/LsReiXXOzKY thread)
Steps/Code/Corpus to Reproduce
import pickle as pkl
from bounter import bounter
import numpy as np
import psutil
import gc
def get_used_memory():
"""
Return the current am't of used memory, in GB
"""
return '{:.3f}'.format(psutil.virtual_memory().used / 1024.0 / 1024.0 / 1024.0)
def log(msg):
print(msg, ', memory =', get_used_memory())
def main():
log('Starting with np array')
a = np.random.randint(0, 512, (8, 33554432), dtype='int32')
log('Initialized array')
for i in range(6):
with open('array.pkl', 'wb') as f:
pkl.dump(a, f, protocol=pkl.HIGHEST_PROTOCOL)
log('Finished saving the ' + str(i) + 'th copy of the array')
del a
gc.collect()
log('deleted array and performed gc.collect() ')
counter = bounter(size_mb=1024, need_iteration=False, log_counting=1024)
log('Initialized counter')
for i in range(6):
with open('counter.pkl','wb') as f:
pkl.dump(counter, f, protocol=pkl.HIGHEST_PROTOCOL)
log('Finished saving the ' + str(i) + 'th copy of the bounter')
del counter
gc.collect()
log('deleted array and performed gc.collect() ')
log('Finished')
if __name__ == '__main__':
main()
Expected Results
Memory shouldn't increase significantly after each dump
Actual Results
I get the resulting log statements along with the two pkl files each 1.1 GB in size:
('Starting with np array', ', memory =', '3.539')
('Initialized array', ', memory =', '4.540')
('Finished saving the 0th copy of the array', ', memory =', '4.540')
('Finished saving the 1th copy of the array', ', memory =', '4.544')
('Finished saving the 2th copy of the array', ', memory =', '4.549')
('Finished saving the 3th copy of the array', ', memory =', '4.549')
('Finished saving the 4th copy of the array', ', memory =', '4.553')
('Finished saving the 5th copy of the array', ', memory =', '4.562')
('deleted array and performed gc.collect() ', ', memory =', '3.561')
('Initialized counter', ', memory =', '3.561')
('Finished saving the 0th copy of the bounter', ', memory =', '4.567')
('Finished saving the 1th copy of the bounter', ', memory =', '5.573')
('Finished saving the 2th copy of the bounter', ', memory =', '6.577')
('Finished saving the 3th copy of the bounter', ', memory =', '7.576')
('Finished saving the 4th copy of the bounter', ', memory =', '8.579')
('Finished saving the 5th copy of the bounter', ', memory =', '9.582')
('deleted array and performed gc.collect() ', ', memory =', '9.580')
('Finished', ', memory =', '9.580')
Here, I see 2 suspicious places, first with memory increasing
('Finished saving the 0th copy of the bounter', ', memory =', '4.567')
('Finished saving the 1th copy of the bounter', ', memory =', '5.573')
('Finished saving the 2th copy of the bounter', ', memory =', '6.577')
('Finished saving the 3th copy of the bounter', ', memory =', '7.576')
('Finished saving the 4th copy of the bounter', ', memory =', '8.579')
('Finished saving the 5th copy of the bounter', ', memory =', '9.582')
and the second one (that looks like memory-leak)
('Finished saving the 5th copy of the bounter', ', memory =', '9.582')
('deleted array and performed gc.collect() ', ', memory =', '9.580')
('Finished', ', memory =', '9.580')
Description
"I'm using bounter to count the frequency of items in a large set. I was periodically pickling the bounter object. Doing this causes the memory to continually increase" (based on https://groups.google.com/forum/#!topic/gensim/LsReiXXOzKY thread)
Steps/Code/Corpus to Reproduce
Expected Results
Memory shouldn't increase significantly after each
dump
Actual Results
I get the resulting log statements along with the two pkl files each 1.1 GB in size:
Here, I see 2 suspicious places, first with memory increasing
and the second one (that looks like memory-leak)
Versions