Closed GoogleCodeExporter closed 9 years ago
No it should definitely not use that much memory. Once a long time ago bafprp
could
blow up a 6mb file to 400mb before I changed the design a bit. These days I
usually
parse files with about 10k records and achieve 40 mb or so.
Do you have access to another OS to try running your 16 meg file through bafprp
or
just OpenBSD?
If you don't I will have to merge the last few months of baf files and see if I
can't
get bafprp to blow up. Windows or Debian Linux would be best since those are
the two
I actually ran tests on.
Original comment by charless...@gmail.com
on 5 Nov 2009 at 1:32
I should have explained better, but I tested this on both OpenBSD and FreeBSD.
The
OpenBSD machine dumps core after about 800M are eaten (the machine didn't have
any
more!), but the FreeBSD machine had more free memory and peaked at 1300M.
I probably could find a Linux machine to test on. I'm curious if the compiler
options should be changed for BSD; I'm not sure what considerations where made
for
the current settings.
Original comment by th...@bendtel.net
on 5 Nov 2009 at 1:51
I tested on an Ubuntu 9.05 box. If I am reading it right, it uses about 1200M
on
there. I'm not as sure I understand the Ubuntu version of top, but I started
with
about 1500M free and it shrank to about 300M free while running.
Original comment by th...@bendtel.net
on 5 Nov 2009 at 6:47
Ok, ill put together an extra large baf file and run some tests. Even though a
new
object is created for each field and record in a file, anything more than 10x
the
original file size is ridiculous.
I know a nice and easy way to cut memory usage to nothing but it would require
a bit
of a redesign. It might be a couple weeks but it will be fixed.
Original comment by charless...@gmail.com
on 5 Nov 2009 at 7:20
I committed some changes that should hopefully reduce memory usage to next to
nothing. I have not had a chance to run any baf files yet as I do not have any
currently, so if you want to recompile and test it out be my guest.
I am going to start working on the record redesign while I await my own baf
files.
Original comment by charless...@gmail.com
on 15 Nov 2009 at 10:22
Ran with a 500 meg file, memory usage was exactly 502 megs the entire time :)
Original comment by charless...@gmail.com
on 17 Nov 2009 at 4:37
I downloaded and compiled the latest code as of today.
Processing can now complete very nicely and memory usage seems very reasonable.
Also, output begins immediately (as expected) so I am pretty sure I know what
changes
were made ;)
Hopefully this does not effect the detection of duplicates(?)
Thanks!
Original comment by th...@bendtel.net
on 19 Nov 2009 at 7:11
It will still detect duplicates, instead of storing the entire record in memory
until
output type it only stores the crc then deletes the record. With this design
bafprp
will not be able to give very much data on exactly what record a duplicate
collided
with, but the memory usage is much more reasonable.
Original comment by charless...@gmail.com
on 19 Nov 2009 at 7:44
Original issue reported on code.google.com by
th...@bendtel.net
on 4 Nov 2009 at 11:18