williballenthin / python-registry

Pure Python parser for Windows Registry hives.
Apache License 2.0
425 stars 103 forks source link

Large hives can take up large amounts of memory #75

Open dhagrow opened 7 years ago

dhagrow commented 7 years ago

Some registry hives can be as large as 2GB. Maybe not a big issue for most people, but also not difficult to fix. I went with the following solution using mmap for myself. It substantially reduces the time to read from a large hive and uses almost no memory.

import mmap
from Registry import RegistryParse
from Registry.Registry import Registry as _Registry

class Registry(_Registry):
    def __init__(self, f):
        self._buf = mmap.mmap(f.fileno(), 0, prot=mmap.ACCESS_READ)
        self._regf = RegistryParse.REGFBlock(self._buf, 0, False)

Used like this:

with open(path) as f:
    r = Registry(f)
    # stuff
williballenthin commented 7 years ago

hey @dhagrow

this is a good solution. if i were to re-write this library, i'd probably have the registry constructor accept a buffer object, and leave it up to the caller to (1) read from the file, (2) create a mmap, (3) do something i haven't even thought of.

there are a couple other warts that i'd want to address in a major version bump, so i'll include this in the list.

thanks for sharing this tip!

EccoTheFlintstone commented 5 years ago

hey, any news on this? It would be a great feature indeed