Open Krakoer opened 3 weeks ago
Hi,
Thanks for reporting this issue; this is indeed a little problem. I suspect that the conversion from Rust to Python has an overhead, but not that much of an overhead. I'll try to take a look at this next week.
I've created a large file using this script:
with open("large_file.bin", "wb") as f:
for _ in range(1024 * 1024):
f.write(b"X" * 20)
f.write(b"\xff\xff\xff\xff")
This will create a file with size around 20MB which contains a lot of strings. I've reproduced the problem using this script:
import time
import rust_strings
from memory_profiler import profile
@profile()
def main():
time.sleep(1)
x = rust_strings.strings("large_file.bin")
time.sleep(1)
if __name__ == "__main__":
main()
The memory huge consumption reproduce:
Line # Mem usage Increment Occurrences Line Contents
=============================================================
7 22.1 MiB 22.1 MiB 1 @profile()
8 def main():
9 22.1 MiB 0.0 MiB 1 time.sleep(1)
10 206.8 MiB 184.7 MiB 1 x = rust_strings.strings("large_file.bin")
11 206.8 MiB 0.0 MiB 1 time.sleep(1)
I've tried to debug it but I don't think there is a problem. The list contains million of items, which consume more memory than one big string.
Indeed, the issue doesn't show up when providing a file path to strings, but it does when using the bytes input option:
import time
import rust_strings
from memory_profiler import profile
@profile()
def main():
with open("large_file.bin", 'rb') as f:
data = f.read()
time.sleep(1)
x = rust_strings.strings(bytes=data)
time.sleep(1)
if __name__ == "__main__":
main()
Gives this profile: (Black is my modified code)
Hi,
While using the lib, I witnessed a huge memory usage (peak of ~ 230Mo to extract strings from a 22Mo sample) from the python lib but not from the binary. I suspect there is a lot of overhead while allocating strings, but the memory usage drops when the strings are returned from the lib.
To monitor the memory usage, I used memory-profiler and a python script that loads the data in memory, waits for a second, extracts the strings using rust-strings, waits for a second and exits.
Do you have an idea of what can cause such a memory usage? I'll continue to investigate on my side.