atorrez commented 11 years ago

In running a sweep of N-1 strided performance using fs_test and plfs version 2.4rc2 on rrz, I got bad_alloc errors when opening for read. This happened using a 32k block size. I got some errors and some successes using other block sizes.

Here is the command line I used: mpirun -n 32 /users/atorrez/Testing-ext/fs_test/fs_test/fs_test.rrz.x -strided 1 -target plfs:/var/tmp/plfs.atorrez1/atorrez/out.%s -io mpi -time 300 -barriers aopen -shift -type 2 -sync -size 32768

Dave Bonnie is seeing the same errors on smog and rrz using smaller block sizes and shorter -time parameters.

thewacokid commented 11 years ago

So here are some preliminary findings:

On smog:

aprun -n 16 ./fs_test.smog.x -target plfs://users/dbonnie/plfs_n1/out.%s -io mpi -time 5 -barriers aopen -shift -type 2 -sync -size 1 -nodb -strided 1

A single node writing 1 byte blocks will cause this with only a 5 second write time when using a glib backend (because glib is MUCH faster than posix).

On rrz:

mpirun -n 16 ./fs_test.localhost.x -strided 1 -target plfs:/users/dbonnie/plfs_n1/out.%s -io mpi -time 30 -barriers aopen -shift -type 2 -sync -size 1 -nodb

Causes a failure in 30 seconds (because PanFS even with glib is much slower than smog's Lustre).

I have the sneaking feeling this is an index bug or something because these runs create millions and millions of index entries. I have confirmed that this also occurs with compress_contiguous set to false in the plfsrc.

My plfsrc files contain very similar entries between the two systems:

rrz:

global_params: threadpool_size: 1 num_hostdirs: 27
mount_point: /users/dbonnie/plfs_n1 backends:
- location: glib:///panfs/scratch1/vol20/dbonnie/n1

smog:

mount_point: /users/dbonnie/plfs_n1 workload: n-1 backends:
- location: glib:///lustre/lscratch/dbonnie/plfs_1/n1
global_params: threadpool_size: 1 num_hostdirs: 27

So first thought is this may be in older releases as well but it's hard to test. I think Alfred or I will try to confirm that soon with a longer test suite for 2.3.

thewacokid commented 11 years ago

Also, the errors change between runs which makes me think memory corruption in some form or another although bad_alloc seems to be quite common.

thewacokid commented 11 years ago

I've narrowed this down a bit more -- I'm almost positive the problem lies in the indexes when they get to a certain size.

So far the following causes the error:

128 procs (16 nodes), 70000 1-byte writes each proc. Each index is 3.3 MB, total memory usage per node for indexes should be about 3.4 GB per node(each has 16 GB)...so this probably shouldn't fail.

16 procs (2 nodes), 400000 1-byte writes each proc. Each index is 28 MB, total memory usage for indexes should be 3.5 GB per node (each has 16 GB)...so this probably shouldn't fail either.

1 proc, 25000000 1 byte writes. Total index is 1.2 GB.

So clearly something fishy is going on here. I'm going to test at larger scales and with an older version of PLFS to see if I can narrow this down a bit better.

thewacokid commented 11 years ago

Updates:

512 procs (64 nodes), 15000 1 byte writes each. Each index is 704 KB, total usage per node is ~2.75 GB.

All of the above are from RRZ. Smog is a bit different but fails at similar counts per node (though memory usage for indexes at 1088 procs on Smog results in ~12 GB...way more than what fails on RRZ).

The strided access is what's causing every write to be a separate index entry but it shouldn't be crashing where it is.

thewacokid commented 11 years ago

Single node on RRZ: 8 procs, 1000000 objects each, roughly the same memory usage in aggregate to crash.

thewacokid commented 11 years ago

Confirmed that the exact same workload fails on PLFS 2.3...about 46 MB of index per proc, so loading it on a single node uses around 2.9 GB of RAM. There's 16 GB on a node on RRZ.

johnbent commented 11 years ago

Im just curious how our regression failed to find this in 2.3 but did for 2.4?

On Mon, Jun 3, 2013 at 5:24 PM, David Bonnie notifications@github.comwrote:

Confirmed that the exact same workload fails on PLFS 2.3.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/251#issuecomment-18872367 .

thewacokid commented 11 years ago

2.4 is faster than 2.3. The test was right on the edge of failing with 2.3 in terms of object counts after 300 seconds of writing.

johnbent commented 11 years ago

That's pretty awesome actually.

Why faster?

On Jun 3, 2013, at 5:30 PM, David Bonnie notifications@github.com wrote:

2.4 is faster than 2.3. The test was right on the edge of failing with 2.3 in terms of object counts after 300 seconds of writing.

— Reply to this email directly or view it on GitHub.

thewacokid commented 11 years ago

Various little things all adding up I think.

I might have jumped the gun a bit on 2.3 failing - I had linked things weirdly for the my custom MPI and I think I hooked into that accidentally for my 2.3 test. I'll update when I figure out what I managed to muck up. :)

thewacokid commented 11 years ago

Okay - fixed my install and I've gotten 2.3 to fail with OpenMPI 1.6.4 on RRZ in the exact same manner as 2.4rc2.

thewacokid commented 11 years ago

Watching a run on a single node I was able to see what's happening.

The write phase runs and completes normally with low memory usage.
The call to MPI_File_open causes a single proc to load the index. Memory usage exceeds 5 GB for this proc.
The proc that loaded the index then (I assume) distributes this to the other procs which causes the available memory to drop to 0.
Hammer...err...Crash time.

Here's what top shows as the memory is rising on one proc:

11459 dbonnie 20 0 134m 13m 4636 R 100.2 0.1 2:45.79 fs_test.localho 11460 dbonnie 20 0 135m 13m 4668 R 100.2 0.1 2:45.92 fs_test.localho 11463 dbonnie 20 0 134m 13m 4632 R 99.9 0.1 2:45.71 fs_test.localho 11457 dbonnie 20 0 134m 13m 4620 R 99.6 0.1 2:45.31 fs_test.localho 11461 dbonnie 20 0 134m 13m 4632 R 99.6 0.1 2:46.01 fs_test.localho 11462 dbonnie 20 0 134m 13m 4616 R 99.6 0.1 2:45.25 fs_test.localho 11456 dbonnie 20 0 2372m 2.2g 5636 R 98.9 14.0 2:45.71 fs_test.localho 11458 dbonnie 20 0 134m 13m 4644 R 98.9 0.1 2:46.02 fs_test.localho

That proc will rise to just over 5.6 GB, then the following happens:

11525 dbonnie 20 0 1581m 1.4g 4860 R 100.6 9.1 3:47.22 fs_test.localho 11523 dbonnie 20 0 4656m 4.4g 5780 R 99.6 28.1 3:46.44 fs_test.localho 11524 dbonnie 20 0 1629m 1.5g 4844 R 99.6 9.3 3:47.21 fs_test.localho 11526 dbonnie 20 0 1531m 1.4g 4824 R 99.6 8.7 3:46.80 fs_test.localho 11527 dbonnie 20 0 1658m 1.5g 4872 R 99.6 9.5 3:46.64 fs_test.localho 11528 dbonnie 20 0 1575m 1.4g 4828 R 99.6 9.0 3:44.96 fs_test.localho 11529 dbonnie 20 0 1612m 1.5g 4844 R 99.6 9.2 3:47.24 fs_test.localho 11530 dbonnie 20 0 1664m 1.5g 4796 R 99.6 9.6 3:47.19 fs_test.localho

The proc that was using 5+ GB of memory starts to drop in memory usage slightly but every other proc's memory usage skyrockets to the point of a crash.

johnbent commented 11 years ago

How did this pass regression in 2.3 then? No longer sounds like race caused by increased performance.

On Jun 4, 2013, at 2:40 PM, David Bonnie notifications@github.com wrote:

Watching a run on a single node I was able to see what's happening.

The write phase runs and completes normally with low memory usage. The call to MPI_File_open causes a single proc to load the index. Memory usage exceeds 5 GB for this proc. The proc that loaded the index then (I assume) distributes this to the other procs which causes the available memory to drop to 0. Hammer...err...Crash time. Here's what top shows as the memory is rising on one proc:

11459 dbonnie 20 0 134m 13m 4636 R 100.2 0.1 2:45.79 fs_test.localho

11460 dbonnie 20 0 135m 13m 4668 R 100.2 0.1 2:45.92 fs_test.localho

11463 dbonnie 20 0 134m 13m 4632 R 99.9 0.1 2:45.71 fs_test.localho

11457 dbonnie 20 0 134m 13m 4620 R 99.6 0.1 2:45.31 fs_test.localho

11461 dbonnie 20 0 134m 13m 4632 R 99.6 0.1 2:46.01 fs_test.localho

11462 dbonnie 20 0 134m 13m 4616 R 99.6 0.1 2:45.25 fs_test.localho

11456 dbonnie 20 0 2372m 2.2g 5636 R 98.9 14.0 2:45.71 fs_test.localho

11458 dbonnie 20 0 134m 13m 4644 R 98.9 0.1 2:46.02 fs_test.localho

That proc will rise to just over 5.6 GB, then the following happens:

11525 dbonnie 20 0 1581m 1.4g 4860 R 100.6 9.1 3:47.22 fs_test.localho

11523 dbonnie 20 0 4656m 4.4g 5780 R 99.6 28.1 3:46.44 fs_test.localho

11524 dbonnie 20 0 1629m 1.5g 4844 R 99.6 9.3 3:47.21 fs_test.localho

11526 dbonnie 20 0 1531m 1.4g 4824 R 99.6 8.7 3:46.80 fs_test.localho

11527 dbonnie 20 0 1658m 1.5g 4872 R 99.6 9.5 3:46.64 fs_test.localho

11528 dbonnie 20 0 1575m 1.4g 4828 R 99.6 9.0 3:44.96 fs_test.localho

11529 dbonnie 20 0 1612m 1.5g 4844 R 99.6 9.2 3:47.24 fs_test.localho

11530 dbonnie 20 0 1664m 1.5g 4796 R 99.6 9.6 3:47.19 fs_test.localho

The proc that was using 5+ GB of memory starts to drop in memory usage slightly but every other proc's memory usage skyrockets to the point of a crash.

— Reply to this email directly or view it on GitHub.

thewacokid commented 11 years ago

The workload I established isn't time-based so it hits it regardless of speed.

chuckcranor commented 11 years ago

How big are the droppping files on the backend after the write phase completes? just curious...

chuck

On Tue, Jun 04, 2013 at 11:40:26AM -0700, David Bonnie wrote:

Watching a run on a single node I was able to see what's happening.

The write phase runs and completes normally with low memory usage.

The call to MPI_File_open causes a single proc to load the index. Memory usage exceeds 5 GB for this proc.

The proc that loaded the index then (I assume) distributes this to the other procs which causes the available memory to drop to 0.

Hammer...err...Crash time.

thewacokid commented 11 years ago

Here's the contents of the hostdir:

-rw-r--r-- 1 dbonnie dbonnie 0 Jun 4 12:32 dropping.data.1370370765.643511.rrz004.localdomain.0 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.685680.rrz004.localdomain.0 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697148.rrz004.localdomain.7 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697149.rrz004.localdomain.1 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697153.rrz004.localdomain.3 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697154.rrz004.localdomain.4 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697155.rrz004.localdomain.5 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697180.rrz004.localdomain.2 -rw-r--r-- 1 dbonnie dbonnie 977K Jun 4 12:35 dropping.data.1370370765.697778.rrz004.localdomain.6 -rw-r--r-- 1 dbonnie dbonnie 0 Jun 4 12:32 dropping.index.1370370765.643511.rrz004.localdomain.0 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.685680.rrz004.localdomain.0 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697148.rrz004.localdomain.7 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697149.rrz004.localdomain.1 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697153.rrz004.localdomain.3 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697154.rrz004.localdomain.4 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697155.rrz004.localdomain.5 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697180.rrz004.localdomain.2 -rw-r--r-- 1 dbonnie dbonnie 46M Jun 4 12:35 dropping.index.1370370765.697778.rrz004.localdomain.6

I wouldn't imagine this is enough data to cause an issue unless the index entries expand drastically in-memory.

chuckcranor commented 11 years ago

On Tue, Jun 04, 2013 at 12:29:28PM -0700, David Bonnie wrote:

Here's the contents of the hostdir: I wouldn't imagine this is enough data to cause an issue unless the index entries expand drastically in-memory.

Yeah, I don't see how you get to 5GB from that. and if you complete the write phase, then it closes the file, so everthing should be written out to the backend before it starts the read phase, right? (you could run fs_test with "-op write" and then do second "-op read" run to ensure it is all down to backend...)

chuck

thewacokid commented 11 years ago

It's definitely all hitting the backend before doing the open call (I did a write-only test and the results match up with the above backend listing).

chuckcranor commented 11 years ago

On Tue, Jun 04, 2013 at 12:43:50PM -0700, David Bonnie wrote:

It's definitely all hitting the backend before doing the open call (I did a write-only test and the results match up with the above backend listing).

what happens if you generate a global index after a write (e.g. with the "plfs_flatten_index" tool)?

chuck

thewacokid commented 11 years ago

Flattening the index results in a 367 MB global index. Trying to read it now.

thewacokid commented 11 years ago

Looks like the same behavior on read. Is there a way to use the global index when doing an MPI read or is that handled automatically?

chuckcranor commented 11 years ago

On Tue, Jun 04, 2013 at 02:25:41PM -0700, David Bonnie wrote:

Looks like the same behavior on read. Is there a way to use the global index when doing an MPI read or is that handled automatically?

It should look for the global index first and use it if it is found.

you could try writing a simple plfs API program that does a plfs_open for RDONLY and look at the memory usage of that before and after the open. that might help isolate the problem (e.g. is it in the MPI code, or the main lib?).

e.g. something like

main() { Plfs_fd fd; fd = NULL; int ret; printf("before open\n"); system("ps axuwwww"); / or maybe use getpid() to just look at this proc */ ret = plfs_open(&fd, "/big/file", O_RDONLY, getpid(), 0666, NULL); printf("after open, ret was %d\n", ret); system("ps axuwwww"); }

chuck

thewacokid commented 11 years ago

So trying to use the "plfs_flatten_close" MPI-IO hint results in an instant crash on close in fs_test due to a seg fault.

The backtrace: fs_test.localhost.x:557 terminated with signal 11 at PC=2ba3477729fa SP=7fffe032cd58. Backtrace: /lib64/libc.so.6(memcpy+0x1da)[0x2ba3477729fa] /usr/lib64/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x7b)[0x2ba3459abafb] /usr/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x1b5)[0x2ba3459a2935] /users/dbonnie/plfs/lib/libplfs.so.2.4(_ZN5Index16global_to_streamEPPvPm+0x233)[0x2ba34568c293] /users/dbonnie/plfs/lib/libplfs.so.2.4(container_index_stream+0x73)[0x2ba34567a893] /users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(flatten_then_close+0x56)[0x2ba345d3a366] /users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(ADIOI_PLFS_Close+0x174)[0x2ba345d3a994] /users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(ADIO_Close+0x3e)[0x2ba345d29f4e] /users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(mca_io_romio_dist_MPI_File_close+0xef)[0x2ba345d4b6cf] /users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(+0x8a38e)[0x2ba345ca038e] /users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(ompi_file_close+0x59)[0x2ba345ca02a9] /users/dbonnie/source/plfs-regression/inst/mpi/lib/libmpi.so.1(PMPI_File_close+0x78)[0x2ba345ccdf18] ./fs_test.localhost.x(close_file+0x5c4)[0x407296] ./fs_test.localhost.x(read_write_file+0x619)[0x40879f] ./fs_test.localhost.x(main+0x1ae)[0x403f67] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ba347707cdd] ./fs_test.localhost.x[0x403c69]

Mostly just putting that here for future reference. Still digging.

brettkettering commented 11 years ago

The "flatten" flag for fs_test is only valid if you use I/O type of PLFS.

This is only used if the "io" is 'plfs'. If non-zero it tells PLFS to

flatten the index file for better performance.

#

"flatten" : [ 1 ],

I'm not sure about any hint to pass. I don't see anything in the code or template files that mentions using such a hint.

So, you could try to change the io type to plfs and then turn the flatten flag on and see if that works.

Brett

thewacokid commented 11 years ago

[ignore this]Actually, using fs_test "--flatten 1" results in a crash too, even with a fairly small workload.[/ignore this]

EDIT: I found the flatten flag by looking in ad_plfs -- so it should work for MPI-IO, no?

brettkettering commented 11 years ago

Using io of "plfs"? I didn't look at the code closely, but if you use io that is not "plfs" you should just get an exit from the application without any I/O happening and a note that you can use flatten with an io that is not "plfs".

Brett

thewacokid commented 11 years ago

Using either plfs or mpi the -flatten 1 flag crashes with a seg fault.

Using the mpi-io hint "plfs_flatten_close" clearly starts the flattening process but then crashes with a seg fault as well.

johnbent commented 11 years ago

I'm assuming you've checked that workload=n-1 in the plfsrc. Flatten doesn't make sense in the other modes; it shouldn't crash it though.

John

On Wed, Jun 5, 2013 at 11:27 AM, David Bonnie notifications@github.comwrote:

Using either plfs or mpi the -flatten 1 flag crashes with a seg fault.

Using the mpi-io hint "plfs_flatten_close" clearly starts the flattening process but then crashes with a seg fault as well.

On Jun 5, 2013, at 9:20 AM, Brett Kettering wrote:

Using io of "plfs"? I didn't look at the code closely, but if you use io that is not "plfs" you should just get an exit from the application without any I/O happening and a note that you can use flatten with an io that is not "plfs".

Brett

From: David Bonnie <notifications@github.com<mailto: notifications@github.com>mailto:notifications@github.com> Reply-To: plfs/plfs-core <reply@reply.github.com<mailto: reply@reply.github.com>mailto:reply@reply.github.com> Date: Wed, 5 Jun 2013 08:16:50 -0700 To: plfs/plfs-core <plfs-core@noreply.github.com<mailto: plfs-core@noreply.github.com>mailto:plfs-core@noreply.github.com> Cc: Brett Kettering brettk@lanl.gov<mailto:brettk@lanl.gov<mailto: brettk@lanl.gov>> Subject: Re: [plfs-core] Getting bad_alloc Error on N-1 Strided using Certain Block Sizes (#251)

Actually, using fs_test "--flatten 1" results in a crash too, even with a fairly small workload.

— Reply to this email directly or view it on GitHub< https://github.com/plfs/plfs-core/issues/251#issuecomment-18983481>.

— Reply to this email directly or view it on GitHub< https://github.com/plfs/plfs-core/issues/251#issuecomment-18983782>.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/251#issuecomment-18984205 .

brettkettering commented 11 years ago

OK. So, it's a bug for fs_test to let you use the --flatten flag with io of "mpi".

Do you see this line printed?

"INFO RANK[%d]: Flattening plfs index

where rank is zero (0)?

Only rank zero is allowed to call plfs_flatten_index.

Look at the flatten_file function in fs_test.c. Put some debug statements in there to make sure the target path is correct to determine if fs_test is messing up or plfs_flatten_index is messing up.

Thanks, Brett

thewacokid commented 11 years ago

Yeah it prints that for rank 0 only (though with the mpi-io hint I believe it's trying to flatten on all ranks). This is to an N-1 mount, single backend, nothing crazy in terms of parameters.

I'll keep digging.

brettkettering commented 11 years ago

If the hint is trying to flatten on all ranks, then that must be in the ad_* code because fs_test's flatten_file will only call plfs_flatten_index if it is rank 0 or if it's N-N.

Brett

thewacokid commented 11 years ago

A note on memory usage (running plfs_map) for reading an index from two procs writing 12,000,000 objects of 1-byte each, strided. Each index dropping is ~550 MB.

Someone correct me if I'm wrong, but shouldn't I expect to see approximately 550 MB * [2 nodes] in terms of memory usage when reading the index with plfs_map? Instead, I see nearly 10 GB.

Reading back the index with "--uniform 0" to only read a single rank also results in a huge amount of memory usage in relation to the size of the index (4.6 GB).

johnbent commented 11 years ago

Sounds like a gigantic memory leak.

John

On Jun 5, 2013, at 3:57 PM, David Bonnie notifications@github.com wrote:

A note on memory usage (running plfs_map) for reading an index from two procs writing 12,000,000 objects of 1-byte each, strided. Each index dropping is ~550 MB.

Someone correct me if I'm wrong, but shouldn't I expect to see approximately 550 MB * [2 nodes] in terms of memory usage when reading the index with plfs_map? Instead, I see nearly 10 GB.

Reading back the index with "--uniform 0" to only read a single rank also results in a huge amount of memory usage in relation to the size of the index (4.6 GB).

— Reply to this email directly or view it on GitHub.

chuckcranor commented 11 years ago

yeah. if it happens in plfs_map that's cool because it rules out stuff and is easy to run and debug...

chuck

On Wed, Jun 05, 2013 at 01:07:17PM -0700, John Bent wrote:

Sounds like a gigantic memory leak. John

On Jun 5, 2013, at 3:57 PM, David Bonnie notifications@github.com wrote:

A note on memory usage (running plfs_map) for reading an index from two procs writing 12,000,000 objects of 1-byte each, strided. Each index dropping is ~550 MB.

Someone correct me if I'm wrong, but shouldn't I expect to see approximately 550 MB * [2 nodes] in terms of memory usage when reading the index with plfs_map? Instead, I see nearly 10 GB.

Reading back the index with "--uniform 0" to only read a single rank also results in a huge amount of memory usage in relation to the size of the index (4.6 GB).

? Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub: https://github.com/plfs/plfs-core/issues/251#issuecomment-19004858

thewacokid commented 11 years ago

I'm tracking down where the memory is being allocated. I don't think it's a leak (as in allocated but not freed) but more in how the memory is handled for the global index creation.

thewacokid commented 11 years ago

Just an update on this - to reduce memory usage there really just needs to be a rewrite of the global index creation (I don't believe there is really a bug here, just higher memory usage than I initially expected). Perhaps a shared-memory approach on a per-node basis?

chuckcranor commented 11 years ago

Have you been able to quantify the memory allocation? Where is it all going?

earlier you said "Flattening the index results in a 367 MB global index" so how many GBs of RAM was a 367MB index taking? Was it like 5GB?

chuck

On Mon, Jun 10, 2013 at 10:18:34AM -0700, David Bonnie wrote:

Just an update on this - to reduce memory usage there really just needs to be a rewrite of the global index creation (I don't believe there is really a bug here, just higher memory usage than I initially expected). Perhaps a shared-memory approach on a per-node basis?

Reply to this email directly or view it on GitHub: https://github.com/plfs/plfs-core/issues/251#issuecomment-19213367

thewacokid commented 11 years ago

The in-memory map results in most of the memory pressure (over 90% of the in-use memory is from the global index data structures). The on-disk size is roughly tripled in-memory per-process so that coupled with the various buffers and mmap'd files causes memory exhaustion with the cases I highlighted above.

A lot of the memory usage when using plfs_map initially was caused by a massive string construction for the output. I removed that and memory usage fell dramatically (though was still higher than I initially expected) to the quantities above.

johnbent commented 11 years ago

What is the ratio of in-memory size to on-disk size?

John

On Jun 10, 2013, at 12:32 PM, David Bonnie notifications@github.com<mailto:notifications@github.com> wrote:

The in-memory map results in most of the memory pressure (over 90% of the in-use memory is from the global index data structures). The on-disk size is roughly tripled in-memory per-process so that coupled with the various buffers and mmap'd files causes memory exhaustion with the cases I highlighted above.

A lot of the memory usage when using plfs_map initially was caused by a massive string construction for the output. I removed that and memory usage fell dramatically (though was still higher than I initially expected) to the quantities above.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/251#issuecomment-19214154.

thewacokid commented 11 years ago

In my limited tests it's 2-3 times larger in-memory than on-disk.

plfs / plfs-core

Getting bad_alloc Error on N-1 Strided using Certain Block Sizes #251

This is only used if the "io" is 'plfs'. If non-zero it tells PLFS to

flatten the index file for better performance.

"flatten" : [ 1 ],