nucypher / zerodb

*This project is no longer actively maintained. If you'd like to become the maintainer, please let us know.* ZeroDB is an end-to-end encrypted database. Data can be stored and queried on untrusted database servers without ever exposing the encryption key. Clients can execute remote queries against the encrypted data without downloading all of it or suffering an excessive performance hit.
GNU Affero General Public License v3.0
1.56k stars 102 forks source link

Memory leak which doesn't create python objects #38

Open michwill opened 8 years ago

michwill commented 8 years ago

I've noticed a strange memory leak when trying to index texts from wikipedia. Even when in-memory cache size is small, python client process which creates records, occupies more and more memory. The minimal example which recreates this error and shows that everything looks fine to a memory profiler:

import zerodb
import random
import transaction

from zerodb.models import Model, Text
from pympler import tracker

all_chars = map(chr, range(ord('a'), ord('z') + 1))
get_word = lambda: "".join([random.choice(all_chars) for i in range(1, 8)])
words = [get_word() for i in range(10000)]
get_text = lambda: " ".join([random.choice(words) for i in range(50)])

memory_tracker = tracker.SummaryTracker()

class Doc(Model):
    text = Text()

if __name__ == "__main__":
    username = "root"
    passphrase = "very insecure passphrase - never use it"

    db = zerodb.DB(("localhost", 8001), username=username, password=passphrase)

    for i in range(10000):
        print i
        memory_tracker.print_diff()
        with transaction.manager:
            for j in range(1000):
                doc = Doc(text=get_text())
                db.add(doc)

    db.disconnect()
michwill commented 8 years ago

Actually, could be specific to python2 because python2 itself could be memory leaking! Will test with py3

michwill commented 8 years ago

Same problem in python3

michwill commented 8 years ago

Possibly something like this. Solving the problem, perhaps, requires making memory dumps with gdb and exploring what kind of data constitutes the leftovers in memory