rlarranaga / bafprp

Automatically exported from code.google.com/p/bafprp
0 stars 0 forks source link

extreme memory usage (core dump on OpenBSD) #9

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. build from Trunk (tried OpenBSD and FreeBSD)
2. process a BAF file (I used a 16M file with about 160,000 records in it)

On my FreeBSD machine, this works, but bafprp uses 1295M of memory before
it starts to output data.

I noticed this originally because on my OpenBSD box I only have about 800M
of memory free when I started bafprp.  bafprp dumps core prior to any output.

I ran a ktrace, and ended up with a lot of "mmap -1 errno 12 Cannot
allocate memory" (I can provide a full kdump on request).

I would probably classify this as a usability bug, but I have doubts that
it should be using this much memory, so perhaps it is a defect(?)

Original issue reported on code.google.com by th...@bendtel.net on 4 Nov 2009 at 11:18

GoogleCodeExporter commented 9 years ago
No it should definitely not use that much memory.  Once a long time ago bafprp 
could
blow up a 6mb file to 400mb before I changed the design a bit.  These days I 
usually
parse files with about 10k records and achieve 40 mb or so.
Do you have access to another OS to try running your 16 meg file through bafprp 
or
just OpenBSD?
If you don't I will have to merge the last few months of baf files and see if I 
can't
get bafprp to blow up.  Windows or Debian Linux would be best since those are 
the two
I actually ran tests on.

Original comment by charless...@gmail.com on 5 Nov 2009 at 1:32

GoogleCodeExporter commented 9 years ago
I should have explained better, but I tested this on both OpenBSD and FreeBSD.  
The
OpenBSD machine dumps core after about 800M are eaten (the machine didn't have 
any
more!), but the FreeBSD machine had more free memory and peaked at 1300M.

I probably could find a Linux machine to test on.  I'm curious if the compiler
options should be changed for BSD; I'm not sure what considerations where made 
for
the current settings.

Original comment by th...@bendtel.net on 5 Nov 2009 at 1:51

GoogleCodeExporter commented 9 years ago
I tested on an Ubuntu 9.05 box.  If I am reading it right, it uses about 1200M 
on
there.  I'm not as sure I understand the Ubuntu version of top, but I started 
with
about 1500M free and it shrank to about 300M free while running.

Original comment by th...@bendtel.net on 5 Nov 2009 at 6:47

GoogleCodeExporter commented 9 years ago
Ok, ill put together an extra large baf file and run some tests.  Even though a 
new
object is created for each field and record in a file, anything more than 10x 
the
original file size is ridiculous.

I know a nice and easy way to cut memory usage to nothing but it would require 
a bit
of a redesign.  It might be a couple weeks but it will be fixed.

Original comment by charless...@gmail.com on 5 Nov 2009 at 7:20

GoogleCodeExporter commented 9 years ago
I committed some changes that should hopefully reduce memory usage to next to
nothing.  I have not had a chance to run any baf files yet as I do not have any
currently, so if you want to recompile and test it out be my guest.
I am going to start working on the record redesign while I await my own baf 
files.

Original comment by charless...@gmail.com on 15 Nov 2009 at 10:22

GoogleCodeExporter commented 9 years ago
Ran with a 500 meg file, memory usage was exactly 502 megs the entire time :)

Original comment by charless...@gmail.com on 17 Nov 2009 at 4:37

GoogleCodeExporter commented 9 years ago
I downloaded and compiled the latest code as of today.

Processing can now complete very nicely and memory usage seems very reasonable. 
Also, output begins immediately (as expected) so I am pretty sure I know what 
changes
were made ;)

Hopefully this does not effect the detection of duplicates(?)

Thanks!

Original comment by th...@bendtel.net on 19 Nov 2009 at 7:11

GoogleCodeExporter commented 9 years ago
It will still detect duplicates, instead of storing the entire record in memory 
until
output type it only stores the crc then deletes the record.  With this design 
bafprp
will not be able to give very much data on exactly what record a duplicate 
collided
with, but the memory usage is much more reasonable.

Original comment by charless...@gmail.com on 19 Nov 2009 at 7:44