Closed GoogleCodeExporter closed 8 years ago
It turns out this is a problem with the endianness check of the
configure.ac script. The trouble is as follows:
If the user specifies a custom location for libevent (using
--with-libevent=), this custom location is not added to the
LD_LIBRARY_PATH for the configure process. This means that
programs compiled by the configure script will not run correctly.
In the case of the endianness check (AC_C_ENDIAN), a program is
compiled which will have exit status 0 for big endian machines
and 1 for little endian. However, the AC_C_ENDIAN only checks if
the exit status was 0 (success) or non-0 (failure). In the case
of a program which fails to run due to a shared library not being
found, the program exits with a non-1 status. However, this error
is swallowed and the machine is incorrectly assumed to be
little-endian.
This bad define (ENDIAN_LITTLE on a big-endian machine) leads to
the problem descibed above, of some keys being stored and others
not.
Note that this problem will only be revealed on big-endian
machines (e.g. Sparc) because if AC_C_ENDIAN fails on
little-endian machines (e.g. Intel) the problem will be hidden by
the fact that the endianness of the machine is set correctly.
Note also that I suggested above that the problem was due to a
changed libevent. This was in fact incorrect. The reason that I
was getting correct complies with libevent 1.2 but not other
versions was because libevent 1.2 was installed in
/usr/local/lib. So if I used --with-libevent=... to compile with
libevent 1.2a, the binary generated by the AC_C_ENDIAN check was
linked with libevent-1.2a.so, which was found in /usr/local/lib,
leading to a correct result, leading to ENDIAN_BIG being defined.
If memcached was compiled with a libevent version other than
1.2a, the shared library was not located when the AC_C_ENDIAN
check was run, leading to a ENDIAN_LITTLE being defined, and a
bad memcached binary.
Attached is a patch which does two things:
1) sets LD_LIBRARY_PATH to include the dir passed in as --with-libevent;
2) errors if the endian check binary exits with an exit status
other than 0 or 1.
I believe that it would be work checking all instances of
AC_RUN_IFELSE to ensure that exit statuses > 1 are not assumed to
be equivalent to exit status 1.
After applying this patch, running autoconf, and rebuilding, the
problem should be fixed.
Original comment by ehetzner@gmail.com
on 29 Apr 2010 at 1:16
Original comment by ehetzner@gmail.com
on 29 Apr 2010 at 1:17
Attachments:
See also issue 74 http://code.google.com/p/memcached/issues/detail?id=74
Original comment by ehetzner@gmail.com
on 29 Apr 2010 at 1:38
Trond, can you verify and fix or close the Solaris issues?
Original comment by dsalli...@gmail.com
on 13 Jul 2011 at 1:57
I tried to reproduce this on 1.4.6-rc1 without succes (I tried both 32 and 64
bit binaries).
I am using a Sun V210 running a fresh install of Solaris 10 with Solstudio
12.2, libevent 2.0.12-stable
Please reopen the bug if you're able to reproduce it with 1.4.6-rc1 (or newer)
Original comment by trond.no...@gmail.com
on 13 Jul 2011 at 12:31
I just verified on a V490. This is still an issue. You need to use
--with-libevent=/path and otherwise not have a copy of libevent in the standard
path. In other words, if you fail to pass --with-libevent=..., your ./configure
should fail:
checking for libevent directory... configure: error: libevent is required. You
can get it from http://www.monkey.org/~provos/libevent/
If it's already installed, specify its path using --with-libevent=/dir/
When you then pass --with-libevent=..., ./configure will misidentify the
architecture as little-endian:
checking for endianness... little
Do you need a new diff for configure.ac?
Original comment by ehetzner@gmail.com
on 13 Jul 2011 at 6:02
Added missing runtime path. Fixed in
https://github.com/memcached/memcached/commit/2f0a742e78b4ae50703bde72f5dff3952f
fc13fb
Original comment by trond.no...@gmail.com
on 13 Jul 2011 at 9:57
Thanks for addressing this issue. Unfortunately the patch doesn't seem to work
for me due to an error with ld; and this again triggers the problem of the
machine being identified as little-endian.
The essential issue is that any error compiling or running the endianness test
will result in the configure script believing that the machine is little-endian.
I have committed a change to autoconf to return 96 as the exit status when the
endianness is little. This will distinguish between a little-endian machine and
an error.
https://github.com/egh/memcached/commit/52fd0c7ca17e46c36ea07cfb3c692619a653f499
Original comment by ehetzner@gmail.com
on 14 Jul 2011 at 5:32
What errors are you seeing from the linker? If this fails it will also fail to
run runtime, and make test will fail.
Original comment by trond.no...@gmail.com
on 14 Jul 2011 at 5:41
This could be related to our bizarre setup, but here you are:
configure:5681: checking for endianness
configure:5712: gcc -std=gnu99 -o conftest -g -O2 -pthreads -I/home/egh/local//include -L/home/egh/local//lib -Wl,-rpath=/home/egh/local//lib conftest.c -levent >&5
conftest.c: In function `main':
conftest.c:32: warning: implicit declaration of function `exit'
ld: fatal: option -dn and -P are incompatible
ld: fatal: Flags processing errors
collect2: ld returned 1 exit status
configure:5712: $? = 1
configure: program exited with status 1
Original comment by ehetzner@gmail.com
on 14 Jul 2011 at 5:51
FYI, I have to change -pthread to -pthreads in the configure script to work on
solaris.
Original comment by ehetzner@gmail.com
on 14 Jul 2011 at 5:51
which version of gcc is this?
Original comment by trond.no...@gmail.com
on 14 Jul 2011 at 6:00
Sorry, this was with an ancient version of gcc. Here we have a later version of
gcc:
bash-3.00$ gcc -v
Using built-in specs.
Target: sparc-sun-solaris2.8
Configured with: ../gcc-4.3.3/configure --prefix=/opt/csw/gcc4 --exec-prefix=/opt/csw/gcc4 --with-gnu-as --with-as=/opt/csw/bin/gas --without-gnu-ld --with-ld=/usr/ccs/bin/ld --enable-nls --with-included-gettext --with-libiconv-prefix=/opt/csw --with-x --with-mpfr=/opt/csw --with-gmp=/opt/csw --enable-java-awt=xlib --enable-libada --enable-libssp --enable-objc-gc --enable-threads=posix --enable-stage1-languages=c --enable-languages=ada,c,c++,fortran,java,objc
Thread model: posix
gcc version 4.3.3 (GCC)
And an error:
configure:5681: checking for endianness
configure:5712: gcc -std=gnu99 -o conftest -g -O2 -pthreads -I/home/egh/local//include -L/home/egh/local//lib -Wl,-rpath=/home/egh/local//lib conftest.c -levent >&5
conftest.c: In function 'main':
conftest.c:32: warning: implicit declaration of function 'exit'
conftest.c:32: warning: incompatible implicit declaration of built-in function 'exit'
conftest.c:34: warning: incompatible implicit declaration of built-in function 'exit'
ld: fatal: option -dn and -P are incompatible
ld: fatal: Flags processing errors
configure:5712: $? = 1
configure: program exited with status 1
Original comment by ehetzner@gmail.com
on 14 Jul 2011 at 6:10
Hmm.. that gcc is from januar 2009... I'm not sure what it is (I don't have a
recent gcc on my solaris box so I don't know if this is due to options it pass
to the linker on Solaris systems or a problem with that compiler). I did verify
that it worked with the options -Wl,-rpath=/tmp/libevent on my debiab box with
gcc 4.4.5 (october 2010).
I don't have a more recent version of gcc available to test on my sparc box
(btw. is there a reason for not using Solaris Studio 12.2? It's free and I
would suspect it to generate better code for sparc systems...)
Could you try a more recent version of gcc?? (building it on my old v210 is
going to take forever ;))
Original comment by trond.no...@gmail.com
on 14 Jul 2011 at 6:48
I can try, I'll need to build a new gcc.
My main concern is ensuring that configure fail if the endianness test fails,
and not identify the architecture as little-endian, because this can lead to a
successful compile but a binary with the wrong hash algorithm, causing cache
misses. This is what the commit I linked to on github should ensure (unless the
error causes an exist status of 97).
Re. solaris studio, I do not have much control over what software is installed
on these machines. But I will have a look at solaris studio anyhow, thanks for
the pointer.
Thank you for looking at this!
Original comment by ehetzner@gmail.com
on 14 Jul 2011 at 7:57
I built gcc 4.6.1 on my box, and it reports the same problem you're seeing. I'm
sending these options to the linker, and the solaris linker want's -R to set
the runtime path and not -rpath as the gnu linker use. I pushed a fix for this
and verified it with gcc 4.6.1 on Solaris 10 sparc, Debian linux and Solaris
x86 intel.
Please verify that it also works for you
Cheers
Original comment by trond.no...@gmail.com
on 15 Jul 2011 at 5:33
Thanks, that fixes it for me, too. I thought I was using the gnu ld, but maybe
not.
Original comment by ehetzner@gmail.com
on 15 Jul 2011 at 5:30
Original issue reported on code.google.com by
ehetzner@gmail.com
on 25 Jan 2010 at 11:47Attachments: