nkaul / pdsh

Automatically exported from code.google.com/p/pdsh
GNU General Public License v2.0
0 stars 0 forks source link

Allow more levels of directory recursion in rpdcp server #48

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I was looking into getting pdsh built for ppc64.  It failed test
t0006-pdcp.sh #19 "rpdcp -r works" due to getting a SIGSEGV in the
rpdcp command.  The test has about 7 levels of directory recursion
(tree/dir/a/b/c/d/e).  On ppc64, the SEGV came after about the 5th
level (tree/dir/a/b/c).  Eventually, I found that stack was being
exhausted.

Here are some numbers:

  $ getconf -a | grep PAGE_SIZE
  x86_64 : PAGE_SIZE 4096
  ppc64  : PAGE_SIZE 65536

  $ grep ' DSH_THREAD_STACKSIZE ' src/pdsh/dsh.c
  #define DSH_THREAD_STACKSIZE    128*1024

So the ppc64 system is using 64K pages with a thread stack size of
128K.  There is stack guard page, which being one page is 64K on
ppc64, so the usable stack is 64K.  In src/pdsh/pcp_server.c in
_sink(), there is a "char buf[BUFSIZ]" definition where BUFSIZ in
8192.  Therefore, the stack grows by at least 8K per recursion in
_sink().  64K / 8K = 8, which is reasonably close to the 5 levels that
are actually achieved, especially given that there are over 400 bytes
worth of other stack variables.

I did some tests upping the stack size and moving buf to the heap.
The latter was a much bigger win in terms of the number of levels deep
that could be recursed into before stack exhaustion.

  buf on stack:
    ppc64 128K stack size = 5 levels
    ppc64 256K stack size = 20 levels

    x86_64 128K stack size = 14 levels

  buf on heap:
    ppc64 128K stack size = 83 levels

    x86_64 128K stack size = 227 levels

I attached a patch that moves buf to the heap plus cleans up a few
other items in that area of the code.

Original issue reported on code.google.com by pywat...@gmail.com on 15 Feb 2012 at 4:40

Attachments:

GoogleCodeExporter commented 8 years ago
Nicely done.
I'll apply this for the next pdcp release.

Should we separately add a test for pdcp to ensure a minimum level of recursion
depth?

Maybe undoing the recursion would be a good idea?

Original comment by mark.gro...@gmail.com on 15 Feb 2012 at 5:06

GoogleCodeExporter commented 8 years ago
Thanks.

I did some more tests.  pdcp (pushing to multiple nodes) goes out to at least 
1000 levels.  rpdcp (pulling from multiple nodes) goes out to 83 levels on 
ppc64 with my patch.  Somehow the two are quite different.  Adding tests for 
either or both that explicitly test recursion depth to some reasonable value 
makes sense to me.

Undoing the recursion would be nice for at least a few reasons.  One, it would 
be a good excuse to cleanup the _sink() code.  Two, it would handle a much 
deeper directory tree.  Three, if it does fail, a better error message of "out 
of memory" could be given rather than SEGV.  However, I would put it as a low 
priority enhancement, since as far as I know, no one noticed the old limit of 
14 levels on x86_64, so the new, much larger limits are even less likely to be 
an issue in practice.

Original comment by pywat...@gmail.com on 15 Feb 2012 at 5:51

GoogleCodeExporter commented 8 years ago
Your patch was applied in commit 5a8411b83ff

Original comment by mark.gro...@gmail.com on 16 Feb 2012 at 12:02