mjschultz / py-radix

Python radix tree implementation for IPv4 and IPv6 prefix matching.
Other
121 stars 37 forks source link

question on memory usage #5

Closed hadiasghari closed 9 years ago

hadiasghari commented 10 years ago

Hi,

This is more of a question. I have a pretty large number of prefixes (half a million) to load. The C implementation of the library uses about 340MB on an x64 machine to load that; the Python implementation uses 1.8GB for the same data on the same machine!

Is this a mere limitation and inefficiency of Python memory handling versus C, or is it perhaps pointing to some bug? I can upload the prefix list if that helps.

Thanks!

mjschultz commented 10 years ago

I suspect it is more to do with python's handling of memory than a bug, but I'm not positive and I plan to investigate down the line.

That said there are some python tricks to reduce memory handling and when I made the pure python implementation it was (is) a fairly naïve direct implementation of the C version with no pythonic tricks. My first goal was compatibility, which I think is mostly achieved (except for some potential bugs as you've seen); that is the 0.6.X series. My next goal is to make the pure python version more pythonic (this will be the 0.7.X series when I get to it). After that I plan on focusing on optimizations for both memory and performance with the goal of being as good as the C version, that will lead to a 1.0 release.

So I'll leave this issue open until that point. I can probably synthesize a large amount of prefixes to get some good divergent memory usages but if I need a real set I'll follow up with you.