weberypf / memcached

Automatically exported from code.google.com/p/memcached
0 stars 0 forks source link

Values with certain keys are not stored on solaris sparc with libevent 1.4.3 #121

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I am experiencing problems using memcached on Solaris (sparc) similar
to those described here:

http://groups.google.com/group/memcached/browse_thread/thread/77b3e69c3b33fd18

It seems to depend on the key use (perhaps the length of the key?).
For instance, the key 'test' works, but the key 'testy' does not.

I have these problems when using memcached linked with libevent
1.4.3-stable.

If I link with a libevent from the 1.2 series, I do not experience
this problem, but I am not sure the provenance of the libevent that I
have linked with successfully, so more testing is probably needed.

I am currently working to attempt to tie this problem further down. I
am submitting this problem as a memcached bug, though it is probably
equally likely to be a libevent bug.

Please let me know if I can send more information.

{{{
$ uname -a
SunOS kandinsky 5.10 Generic_141414-10 sun4u sparc SUNW,Sun-Fire-V490 Solaris

$ gcc -v
Reading specs from /usr/local/lib/gcc/sparc-sun-solaris2.10/3.4.6/specs
Configured with: ../configure --with-as=/usr/ccs/bin/as
--with-ld=/usr/ccs/bin/ld --enable-shared --enable-languages=c,c++,f77
Thread model: posix
gcc version 3.4.6

[ unpack & cd to libevent-1.4.13-stable source dir ]

$ ./configure --prefix=$HOME/tmp/memcached_test_latest
[ ... ]

$ make install
[ ... ]

[ change to memcached-1.4.3 source directory ]

[ fix problem with pthread on solaris ]

$ sed s/-pthread/-pthreads/g < configure > configure.new && mv
configure.new configure && chmod a+x ./configure

$ ./configure --prefix=$HOME/tmp/memcached_test_latest/
--with-libevent=$HOME/tmp/memcached_test_latest/
[ ... ]

$ make install
[ ... ]

$ cd $HOME/tmp/memcached_test_latest/

$ LD_LIBRARY_PATH=$HOME/tmp/memcached_test_latest/lib bin/memcached -vv -p
11213
slab class   1: chunk size        80 perslab   13107
slab class   2: chunk size       104 perslab   10082
slab class   3: chunk size       136 perslab    7710
slab class   4: chunk size       176 perslab    5957
slab class   5: chunk size       224 perslab    4681
slab class   6: chunk size       280 perslab    3744
slab class   7: chunk size       352 perslab    2978
slab class   8: chunk size       440 perslab    2383
slab class   9: chunk size       552 perslab    1899
slab class  10: chunk size       696 perslab    1506
slab class  11: chunk size       872 perslab    1202
slab class  12: chunk size      1096 perslab     956
slab class  13: chunk size      1376 perslab     762
slab class  14: chunk size      1720 perslab     609
slab class  15: chunk size      2152 perslab     487
slab class  16: chunk size      2696 perslab     388
slab class  17: chunk size      3376 perslab     310
slab class  18: chunk size      4224 perslab     248
slab class  19: chunk size      5280 perslab     198
slab class  20: chunk size      6600 perslab     158
slab class  21: chunk size      8256 perslab     127
slab class  22: chunk size     10320 perslab     101
slab class  23: chunk size     12904 perslab      81
slab class  24: chunk size     16136 perslab      64
slab class  25: chunk size     20176 perslab      51
slab class  26: chunk size     25224 perslab      41
slab class  27: chunk size     31536 perslab      33
slab class  28: chunk size     39424 perslab      26
slab class  29: chunk size     49280 perslab      21
slab class  30: chunk size     61600 perslab      17
slab class  31: chunk size     77000 perslab      13
slab class  32: chunk size     96256 perslab      10
slab class  33: chunk size    120320 perslab       8
slab class  34: chunk size    150400 perslab       6
slab class  35: chunk size    188000 perslab       5
slab class  36: chunk size    235000 perslab       4
slab class  37: chunk size    293752 perslab       3
slab class  38: chunk size    367192 perslab       2
slab class  39: chunk size    458992 perslab       2
slab class  40: chunk size    573744 perslab       1
slab class  41: chunk size    717184 perslab       1
slab class  42: chunk size   1048576 perslab       1
<24 server listening (auto-negotiate)
<27 server listening (auto-negotiate)
<28 send buffer was 57344, now 2097152
<28 server listening (udp)
<28 server listening (udp)
<29 send buffer was 57344, now 2097152
<28 server listening (udp)
<28 server listening (udp)
<29 server listening (udp)
<29 server listening (udp)
<29 server listening (udp)
<29 server listening (udp)
<30 new auto-negotiating client connection
30: Client using the ascii protocol
<30 set test 0 60 2
>30 STORED
<30 get test
>30 sending key test
>30 END
<30 set testy 0 60 2
>30 STORED
<30 get testy
>30 END

[ from another terminal ]
$ telnet localhost 11213
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
set test 0 60 2
12
STORED
get test
VALUE test 0 2
12
END
set testy 0 60 2
12
STORED
get testy
END
}}}

Original issue reported on code.google.com by ehetzner@gmail.com on 25 Jan 2010 at 11:47

Attachments:

GoogleCodeExporter commented 8 years ago
It turns out this is a problem with the endianness check of the
configure.ac script. The trouble is as follows:

If the user specifies a custom location for libevent (using
--with-libevent=), this custom location is not added to the
LD_LIBRARY_PATH for the configure process. This means that
programs compiled by the configure script will not run correctly.

In the case of the endianness check (AC_C_ENDIAN), a program is
compiled which will have exit status 0 for big endian machines
and 1 for little endian. However, the AC_C_ENDIAN only checks if
the exit status was 0 (success) or non-0 (failure). In the case
of a program which fails to run due to a shared library not being
found, the program exits with a non-1 status. However, this error
is swallowed and the machine is incorrectly assumed to be
little-endian.

This bad define (ENDIAN_LITTLE on a big-endian machine) leads to
the problem descibed above, of some keys being stored and others
not.

Note that this problem will only be revealed on big-endian
machines (e.g. Sparc) because if AC_C_ENDIAN fails on
little-endian machines (e.g. Intel) the problem will be hidden by
the fact that the endianness of the machine is set correctly.

Note also that I suggested above that the problem was due to a
changed libevent. This was in fact incorrect. The reason that I
was getting correct complies with libevent 1.2 but not other
versions was because libevent 1.2 was installed in
/usr/local/lib. So if I used --with-libevent=... to compile with
libevent 1.2a, the binary generated by the AC_C_ENDIAN check was
linked with libevent-1.2a.so, which was found in /usr/local/lib,
leading to a correct result, leading to ENDIAN_BIG being defined.
If memcached was compiled with a libevent version other than
1.2a, the shared library was not located when the AC_C_ENDIAN
check was run, leading to a ENDIAN_LITTLE being defined, and a
bad memcached binary.

Attached is a patch which does two things:

1) sets LD_LIBRARY_PATH to include the dir passed in as --with-libevent;

2) errors if the endian check binary exits with an exit status
other than 0 or 1.

I believe that it would be work checking all instances of
AC_RUN_IFELSE to ensure that exit statuses > 1 are not assumed to
be equivalent to exit status 1.

After applying this patch, running autoconf, and rebuilding, the
problem should be fixed.

Original comment by ehetzner@gmail.com on 29 Apr 2010 at 1:16

GoogleCodeExporter commented 8 years ago

Original comment by ehetzner@gmail.com on 29 Apr 2010 at 1:17

Attachments:

GoogleCodeExporter commented 8 years ago
See also issue 74 http://code.google.com/p/memcached/issues/detail?id=74

Original comment by ehetzner@gmail.com on 29 Apr 2010 at 1:38

GoogleCodeExporter commented 8 years ago
Trond, can you verify and fix or close the Solaris issues?

Original comment by dsalli...@gmail.com on 13 Jul 2011 at 1:57

GoogleCodeExporter commented 8 years ago
I tried to reproduce this on 1.4.6-rc1 without succes (I tried both 32 and 64 
bit binaries).

I am using a Sun V210 running a fresh install of Solaris 10 with Solstudio 
12.2, libevent 2.0.12-stable

Please reopen the bug if you're able to reproduce it with 1.4.6-rc1 (or newer)

Original comment by trond.no...@gmail.com on 13 Jul 2011 at 12:31

GoogleCodeExporter commented 8 years ago
I just verified on a V490. This is still an issue. You need to use 
--with-libevent=/path and otherwise not have a copy of libevent in the standard 
path. In other words, if you fail to pass --with-libevent=..., your ./configure 
should fail:

  checking for libevent directory... configure: error: libevent is required.  You
  can get it from http://www.monkey.org/~provos/libevent/

    If it's already installed, specify its path using --with-libevent=/dir/

When you then pass --with-libevent=..., ./configure will misidentify the 
architecture as little-endian:

  checking for endianness... little

Do you need a new diff for configure.ac?

Original comment by ehetzner@gmail.com on 13 Jul 2011 at 6:02

GoogleCodeExporter commented 8 years ago
Added missing runtime path. Fixed in 
https://github.com/memcached/memcached/commit/2f0a742e78b4ae50703bde72f5dff3952f
fc13fb

Original comment by trond.no...@gmail.com on 13 Jul 2011 at 9:57

GoogleCodeExporter commented 8 years ago
Thanks for addressing this issue. Unfortunately the patch doesn't seem to work 
for me due to an error with ld; and this again triggers the problem of the 
machine being identified as little-endian.

The essential issue is that any error compiling or running the endianness test 
will result in the configure script believing that the machine is little-endian.

I have committed a change to autoconf to return 96 as the exit status when the 
endianness is little. This will distinguish between a little-endian machine and 
an error.

https://github.com/egh/memcached/commit/52fd0c7ca17e46c36ea07cfb3c692619a653f499

Original comment by ehetzner@gmail.com on 14 Jul 2011 at 5:32

GoogleCodeExporter commented 8 years ago
What errors are you seeing from the linker? If this fails it will also fail to 
run runtime, and make test will fail.

Original comment by trond.no...@gmail.com on 14 Jul 2011 at 5:41

GoogleCodeExporter commented 8 years ago
This could be related to our bizarre setup, but here you are:

  configure:5681: checking for endianness
  configure:5712: gcc -std=gnu99 -o conftest -g -O2 -pthreads -I/home/egh/local//include  -L/home/egh/local//lib  -Wl,-rpath=/home/egh/local//lib conftest.c  -levent >&5
  conftest.c: In function `main':
  conftest.c:32: warning: implicit declaration of function `exit'
  ld: fatal: option -dn and -P are incompatible
  ld: fatal: Flags processing errors
  collect2: ld returned 1 exit status
  configure:5712: $? = 1
  configure: program exited with status 1

Original comment by ehetzner@gmail.com on 14 Jul 2011 at 5:51

GoogleCodeExporter commented 8 years ago
FYI, I have to change -pthread to -pthreads in the configure script to work on 
solaris.

Original comment by ehetzner@gmail.com on 14 Jul 2011 at 5:51

GoogleCodeExporter commented 8 years ago
which version of gcc is this?

Original comment by trond.no...@gmail.com on 14 Jul 2011 at 6:00

GoogleCodeExporter commented 8 years ago
Sorry, this was with an ancient version of gcc. Here we have a later version of 
gcc:

  bash-3.00$ gcc -v
  Using built-in specs.
  Target: sparc-sun-solaris2.8
  Configured with: ../gcc-4.3.3/configure --prefix=/opt/csw/gcc4 --exec-prefix=/opt/csw/gcc4 --with-gnu-as --with-as=/opt/csw/bin/gas --without-gnu-ld --with-ld=/usr/ccs/bin/ld --enable-nls --with-included-gettext --with-libiconv-prefix=/opt/csw --with-x --with-mpfr=/opt/csw --with-gmp=/opt/csw --enable-java-awt=xlib --enable-libada --enable-libssp --enable-objc-gc --enable-threads=posix --enable-stage1-languages=c --enable-languages=ada,c,c++,fortran,java,objc
  Thread model: posix
  gcc version 4.3.3 (GCC) 

And an error:

  configure:5681: checking for endianness
  configure:5712: gcc -std=gnu99 -o conftest -g -O2 -pthreads -I/home/egh/local//include  -L/home/egh/local//lib  -Wl,-rpath=/home/egh/local//lib conftest.c  -levent >&5
  conftest.c: In function 'main':
  conftest.c:32: warning: implicit declaration of function 'exit'
  conftest.c:32: warning: incompatible implicit declaration of built-in function 'exit'
  conftest.c:34: warning: incompatible implicit declaration of built-in function 'exit'
  ld: fatal: option -dn and -P are incompatible
  ld: fatal: Flags processing errors
  configure:5712: $? = 1
  configure: program exited with status 1

Original comment by ehetzner@gmail.com on 14 Jul 2011 at 6:10

GoogleCodeExporter commented 8 years ago
Hmm.. that gcc is from januar 2009... I'm not sure what it is (I don't have a 
recent gcc on my solaris box so I don't know if this is due to options it pass 
to the linker on Solaris systems or a problem with that compiler). I did verify 
that it worked with the options -Wl,-rpath=/tmp/libevent on my debiab box with 
gcc 4.4.5 (october 2010).

I don't have a more recent version of gcc available to test on my sparc box 
(btw. is there a reason for not using Solaris Studio 12.2? It's free and I 
would suspect it to generate better code for sparc systems...)

Could you try a more recent version of gcc?? (building it on my old v210 is 
going to take forever ;))

Original comment by trond.no...@gmail.com on 14 Jul 2011 at 6:48

GoogleCodeExporter commented 8 years ago
I can try, I'll need to build a new gcc.

My main concern is ensuring that configure fail if the endianness test fails, 
and not identify the architecture as little-endian, because this can lead to a 
successful compile but a binary with the wrong hash algorithm, causing cache 
misses. This is what the commit I linked to on github should ensure (unless the 
error causes an exist status of 97).

Re. solaris studio, I do not have much control over what software is installed 
on these machines. But I will have a look at solaris studio anyhow, thanks for 
the pointer.

Thank you for looking at this!

Original comment by ehetzner@gmail.com on 14 Jul 2011 at 7:57

GoogleCodeExporter commented 8 years ago
I built gcc 4.6.1 on my box, and it reports the same problem you're seeing. I'm 
sending these options to the linker, and the solaris linker want's -R to set 
the runtime path and not -rpath as the gnu linker use. I pushed a fix for this 
and verified it with gcc 4.6.1 on Solaris 10 sparc, Debian linux and Solaris 
x86 intel.

Please verify that it also works for you

Cheers

Original comment by trond.no...@gmail.com on 15 Jul 2011 at 5:33

GoogleCodeExporter commented 8 years ago
Thanks, that fixes it for me, too. I thought I was using the gnu ld, but maybe 
not.

Original comment by ehetzner@gmail.com on 15 Jul 2011 at 5:30