Closed GoogleCodeExporter closed 9 years ago
There are also some different tests failing on Solaris 10 amd64:
slab class 42: chunk size 1048576 perslab 1
<24 server listening (binary)
<27 server listening (binary)
<28 send buffer was 57344, now 2097152
<32 send buffer was 57344, now 2097152
<31 server listening (udp)
<28 server listening (udp)
<30 server listening (udp)
<29 server listening (udp)
<35 server listening (udp)
<32 server listening (udp)
<34 server listening (udp)
<33 server listening (udp)
<36 new binary client connection.
<36 connection closed.
Invalid value for binding protocol: http
-- should be one of auto, binary, or ascii
t/00-startup.t ....... 17/18 Number of threads must be greater than 0
t/00-startup.t ....... ok
t/64bit.t ............ ok
t/binary-get.t ....... ok
t/binary-sasl.t ...... This server is not built with SASL support.
t/binary-sasl.t ...... ok
t/binary.t ........... 1371/3639
# Failed test at t/binary.t line 422.
# got: '1024'
# expected: '0'
t/binary.t ........... 3621/3639 # Looks like you failed 1 test of 3639.
t/binary.t ........... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3639 subtests
t/bogus-commands.t ... ok
t/cas.t .............. ok
t/daemonize.t ........ ok
t/dash-M.t ........... ok
t/evictions.t ........ ok
t/expirations.t ...... ok
t/flags.t ............ ok
t/flush-all.t ........ ok
t/getset.t ........... ok
t/incrdecr.t ......... ok
t/issue_104.t ........ ok
t/issue_108.t ........ ok
t/issue_14.t ......... ok
t/issue_140.t ........ skipped: Fix for Issue 140 was only an illusion
t/issue_152.t ........ ok
t/issue_163.t ........ ok
t/issue_183.t ........ ok
t/issue_192.t ........ ok
t/issue_22.t ......... ok
t/issue_260.t ........ skipped: Only possible to test #260 under artificial
conditions
t/issue_29.t ......... ok
t/issue_3.t .......... ok
t/issue_41.t ......... ok
t/issue_42.t ......... ok
t/issue_50.t ......... ok
t/issue_61.t ......... ok
t/issue_67.t ......... ok
t/issue_68.t ......... ok
t/issue_70.t ......... ok
t/item_size_max.t .... 1/7 Item max size cannot be less than 1024 bytes.
t/item_size_max.t .... 2/7 Cannot set item size limit higher than 128 mb.
t/item_size_max.t .... 3/7 WARNING: Setting item max size above 1MB is not
recommended!
Raising this limit increases the minimum memory requirements
and will decrease your memory efficiency.
t/item_size_max.t .... ok
t/line-lengths.t ..... ok
t/lru-crawler.t ...... ok
t/lru-maintainer.t ... 11/224
# Failed test 'moved some items to cold'
# at t/lru-maintainer.t line 33.
# got: '0'
# expected: anything else
# Looks like you failed 1 test of 224.
t/lru-maintainer.t ... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/224 subtests
t/lru.t .............. ok
t/maxconns.t ......... ok
t/multiversioning.t .. ok
t/noreply.t .......... ok
t/refhang.t .......... ok
t/slabs_reassign.t ... ok
t/stats-conns.t ...... ok
t/stats-detail.t ..... ok
t/stats.t ............ 1/97
# Failed test at t/stats.t line 178.
# got: '1024'
# expected: '0'
# Looks like you failed 1 test of 97.
t/stats.t ............ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/97 subtests
t/touch.t ............ ok
t/udp.t .............. ok
t/unixsocket.t ....... ok
t/whitespace.t ....... fatal: Not a git repository (or any parent up to mount
point /home/dam)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
t/whitespace.t ....... skipped: Skipping tests probably because you don't have
git.
Test Summary Report
-------------------
t/binary.t (Wstat: 256 Tests: 3639 Failed: 1)
Failed test: 1625
Non-zero exit status: 1
t/lru-maintainer.t (Wstat: 256 Tests: 224 Failed: 1)
Failed test: 43
Non-zero exit status: 1
t/stats.t (Wstat: 256 Tests: 97 Failed: 1)
Failed test: 74
Non-zero exit status: 1
Files=51, Tests=7496, 192 wallclock secs ( 3.70 usr 0.61 sys + 14.37 cusr
6.69 csys = 25.37 CPU)
Result: FAIL
Original comment by honkma...@googlemail.com
on 20 Apr 2015 at 1:26
I'm seeing this on Arch Linux as well.
This is similar to an earlier issue I have seen in that I think it might have
to do with hardware lock elision:
https://groups.google.com/d/msg/memcached/Tw6t_W-a6Xc/lXgz8LQ_vS0J
It doesn't lock up on my much older x86_64 desktop machine, but it does lock up
on our build box. gdb output below of the hung process during the
binary_prependq test. Notice Thread 3 especially.
$ sudo gdb -p 10999
GNU gdb (GDB) 7.9
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 10999
Reading symbols from
/home/dan/svn-packages/memcached/trunk/src/memcached-1.4.23/memcached-debug...do
ne.
Reading symbols from /usr/lib/libevent-2.0.so.5...(no debugging symbols
found)...done.
Reading symbols from /usr/lib/libpthread.so.0...(no debugging symbols
found)...done.
[New LWP 11004]
[New LWP 11003]
[New LWP 11002]
[New LWP 11001]
[New LWP 11000]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Reading symbols from /usr/lib/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
0x00007fad9dec6883 in epoll_wait () from /usr/lib/libc.so.6
(gdb) thread apply all backtrace
Thread 6 (Thread 0x7fad9dddd700 (LWP 11000)):
#0 0x00007fad9dec6883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007fad9e3c34d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007fad9e3ae61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x000000000041add4 in worker_libevent (arg=0x1f99480) at thread.c:379
#4 0x00007fad9e188374 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007fad9dec627d in clone () from /usr/lib/libc.so.6
Thread 5 (Thread 0x7fad9d5dc700 (LWP 11001)):
#0 0x00007fad9dec6883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007fad9e3c34d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007fad9e3ae61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x000000000041add4 in worker_libevent (arg=0x1f9a5c8) at thread.c:379
#4 0x00007fad9e188374 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007fad9dec627d in clone () from /usr/lib/libc.so.6
Thread 4 (Thread 0x7fad9cddb700 (LWP 11002)):
#0 0x00007fad9dec6883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007fad9e3c34d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007fad9e3ae61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x000000000041add4 in worker_libevent (arg=0x1f9b710) at thread.c:379
#4 0x00007fad9e188374 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007fad9dec627d in clone () from /usr/lib/libc.so.6
Thread 3 (Thread 0x7fad9c5da700 (LWP 11003)):
#0 0x00007fad9e19064c in __lll_lock_wait () from /usr/lib/libpthread.so.0
#1 0x00007fad9e193090 in __lll_lock_elision () from /usr/lib/libpthread.so.0
#2 0x00000000004183c9 in item_stats_totals (add_stats=add_stats@entry=0x4067d0
<append_stats>, c=c@entry=0x7fad940008c0) at items.c:506
#3 0x0000000000414c57 in get_stats (stat_type=stat_type@entry=0x0,
nkey=nkey@entry=0, add_stats=add_stats@entry=0x4067d0 <append_stats>,
c=c@entry=0x7fad940008c0) at slabs.c:309
#4 0x000000000040cee0 in process_bin_stat (c=0x7fad940008c0) at
memcached.c:1514
#5 complete_nread_binary (c=0x7fad940008c0) at memcached.c:2247
#6 complete_nread (c=c@entry=0x7fad940008c0) at memcached.c:2293
#7 0x00000000004101d0 in drive_machine (c=0x7fad940008c0) at memcached.c:4179
#8 event_handler (fd=<optimized out>, which=<optimized out>,
arg=0x7fad940008c0) at memcached.c:4386
#9 0x00007fad9e3aeca6 in event_base_loop () from /usr/lib/libevent-2.0.so.5
#10 0x000000000041add4 in worker_libevent (arg=0x1f9c858) at thread.c:379
#11 0x00007fad9e188374 in start_thread () from /usr/lib/libpthread.so.0
#12 0x00007fad9dec627d in clone () from /usr/lib/libc.so.6
Thread 2 (Thread 0x7fad9bdd9700 (LWP 11004)):
#0 0x00007fad9e18d9af in pthread_cond_wait@@GLIBC_2.3.2 () from
/usr/lib/libpthread.so.0
#1 0x000000000041a510 in assoc_maintenance_thread (arg=<optimized out>) at
assoc.c:261
#2 0x00007fad9e188374 in start_thread () from /usr/lib/libpthread.so.0
#3 0x00007fad9dec627d in clone () from /usr/lib/libc.so.6
Thread 1 (Thread 0x7fad9e7f2700 (LWP 10999)):
#0 0x00007fad9dec6883 in epoll_wait () from /usr/lib/libc.so.6
#1 0x00007fad9e3c34d8 in ?? () from /usr/lib/libevent-2.0.so.5
#2 0x00007fad9e3ae61a in event_base_loop () from /usr/lib/libevent-2.0.so.5
#3 0x0000000000404eea in main (argc=0, argv=0x7fff82d49fd5) at memcached.c:5724
ok 47 - binary_prependq
Timeout.. killing the process
testapp: testapp.c:725: safe_recv: Assertion `nr != 0' failed.
Makefile:1482: recipe for target 'test' failed
make: *** [test] Aborted
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping : 3
microcode : 0x12
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave avx f16c rdrand lahf_lm abm ida arat pln pts dtherm tpr_shadow vnmi
flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid
rtm xsaveopt
Original comment by dpmc...@gmail.com
on 23 Apr 2015 at 1:54
Can you folks please try https://github.com/memcached/memcached (master branch)
and ping this issue on if it repairs your problem completey? All tests should
run.
Original comment by dorma...@rydia.net
on 24 Apr 2015 at 7:05
I ran the same build/test/package steps that triggered the above, pulling the
one (369845f086) patch from git and applying it, and we seem to be in business.
All tests pass, no hangups or timeouts anymore.
Original comment by dpmc...@gmail.com
on 24 Apr 2015 at 2:41
I also just patched in the latest commit f086 to the tarball.
The test suite now passes on sparcv9 cleanly, the only issue on amd64 left is
t/lru-maintainer.t ... 11/224
# Failed test 'moved some items to cold'
# at t/lru-maintainer.t line 33.
# got: '0'
# expected: anything else
t/lru-maintainer.t ... 195/224 # Looks like you failed 1 test of 224.
t/lru-maintainer.t ... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/224 subtests
Original comment by honkma...@googlemail.com
on 24 Apr 2015 at 3:55
BTW, what happened to the buildfarm?
https://code.google.com/p/memcached/wiki/BuildFarm
The waterfall display is stuck for me.
As I already run a buildbot instance on our farm I can gladly provide some
slaves on Solaris:
http://buildfarm.opencsw.org/buildbot/waterfall
Original comment by honkma...@googlemail.com
on 24 Apr 2015 at 3:57
buildfarm's been dead a long while... it requires quite a lot of time to keep
maintained.
honkman42: That is failing a test under solaris amd64? Can you attach the
output of "prove -v t/lru-maintainer.t" please? Does it fail every time or just
sometimes?
Original comment by dorma...@rydia.net
on 24 Apr 2015 at 6:06
Another push to master for a workaround in the test. Solaris seems to not
handle the background juggler's sleeps in the same way. I'll look into it more,
but need to cut another release since the off by one is pretty serious.
Original comment by dorma...@rydia.net
on 25 Apr 2015 at 8:03
1.4.24 is out. please try if you haven't already.
Original comment by dorma...@rydia.net
on 25 Apr 2015 at 9:02
Yes, 1.4.24 looks good, all tests pass on Solaris 10 Sparc + x86, both in 32
and 64 bit.
I also added memcached to my Solaris buildbot and 'master' with some more
options enabled:
https://buildfarm.opencsw.org/buildbot/waterfall?category=memcached
Now t/whitespace.t is failing sporadically as you can see from the above logs.
Should I open a separate bug report for that?
Also I noticed that some tests failed when I run 32 and 64 bit tests in
parallel, I guess I need to serialize it per host?
If considered useful I could also enable notifications on build failure via
mail or irc or such.
Thanks! -- Dago
Original comment by honkma...@googlemail.com
on 27 Apr 2015 at 8:39
please open other issues for those things. you can have the thing e-mail me I
guess? don't want it to spam a public place just yet.
Original comment by dorma...@rydia.net
on 27 Apr 2015 at 8:54
Sure, I added your email to the list of notifiers. Just let me know if you need
adjusted settings.
Original comment by honkma...@googlemail.com
on 27 Apr 2015 at 8:59
Original issue reported on code.google.com by
honkma...@googlemail.com
on 20 Apr 2015 at 12:53