Closed GoogleCodeExporter closed 8 years ago
How much RAM do you have? Would it happen to be 3 GB?
Original comment by scott.lo...@gmail.com
on 12 Aug 2008 at 4:19
I have 2GB RAM.
Original comment by christia...@googlemail.com
on 12 Aug 2008 at 7:46
I presume that this might be caused by a race condition, because the problem
does not
occur when debugging output is enabled and the whole thing is slower.
Original comment by christia...@googlemail.com
on 12 Aug 2008 at 7:49
That was my first reaction, too. But then I noticed that your stack pointer
was just
about at the 3 GB boundary. Thought I'd follow up and see if you were
overflowing
when multithreaded, but staying sane when single treaded.
Original comment by scott.lo...@gmail.com
on 12 Aug 2008 at 2:19
[deleted comment]
Okay. I used gdb to try to illuminate the whole thing.
The segfault is caused by a call to g_slice_alloc () in libglib.
2008-08-12 19:37:29 INFO: test_io: write 0002001d started (zero block)
2008-08-12 19:37:29 INFO: test_io: write 0002001e started (zero block)
2008-08-12 19:37:29 INFO: test_io: write 0002001f started (zero block)
2008-08-12 19:37:29 INFO: test_io: write 00020020 started (zero block)
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xabcffb90 (LWP 12992)]
0xb7f35ff3 in g_slice_alloc () from /usr/lib/libglib-2.0.so.0
(gdb) info stack
#0 0xb7f35ff3 in g_slice_alloc () from /usr/lib/libglib-2.0.so.0
#1 0xb7f0c65d in ?? () from /usr/lib/libglib-2.0.so.0
#2 0x0804a635 in block_cache_write_block (s3b=0x8057950, block_num=229750,
src=0xac87c048, md5=0x0) at block_cache.c:844
#3 0x0804c515 in fuse_op_write (path=0x8092178 "/file", buf=0xac87c048 "",
size=4096, offset=941056000, fi=0xabcff25c) at fuse_ops.c:419
#4 0xb7fa592e in fuse_fs_write () from /lib/libfuse.so.2
#5 0xb7faa1f9 in ?? () from /lib/libfuse.so.2
#6 0xb7fadd05 in ?? () from /lib/libfuse.so.2
#7 0xb7faed10 in ?? () from /lib/libfuse.so.2
#8 0xb7fb0536 in fuse_session_process () from /lib/libfuse.so.2
#9 0xb7fac8e5 in ?? () from /lib/libfuse.so.2
#10 0xb7d504fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#11 0xb7cd2e5e in clone () from /lib/tls/i686/cmov/libc.so.6
Original comment by christia...@googlemail.com
on 12 Aug 2008 at 5:43
Okay. The block cache seems to cause the problem.
To be more specific: the hashing of the block numbers.
block_cache.c: line 844
In function block_cache_hash_put() which is called by block_cache_write_block():
The function calls a glib function:
g_hash_table_replace(priv->hashtable, key, entry);
This call causes the segfault.
Any ideas why?
Original comment by christia...@googlemail.com
on 12 Aug 2008 at 6:06
Here's one possibility: the process is running out of memory when it attempts
to add
a new hash table entry. However, there's no way for s3backer to know this has
happened because g_hash_table_replace() returns void. Are you running with a
huge
block cache that could be exhausting memory?
In any case, I need to replace the hash table implementation with one that
properly
reports all errors.
Original comment by archie.c...@gmail.com
on 12 Aug 2008 at 6:11
I have enabled the assertions (NDEBUG=0) and now I get:
2008-08-12 20:11:45 INFO: test_io: write 0001817d started (zero block)
2008-08-12 20:11:45 INFO: test_io: write 0001817e started (zero block)
2008-08-12 20:11:45 INFO: test_io: write 0001817f started (zero block)
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xabcfeb90 (LWP 13663)]
0xb7e7386a in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
(gdb) info stack
#0 0xb7e7386a in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#1 0x0804a540 in block_cache_write_block (s3b=0x80574a0, block_num=229716,
src=0xac6bc048, md5=0x0) at block_cache.c:830
#2 0x0804c515 in fuse_op_write (path=0x80923d8 "/file", buf=0xac6bc048 "",
size=4096, offset=940916736, fi=0xabcfe25c) at fuse_ops.c:419
#3 0xb7f0d92e in fuse_fs_write () from /lib/libfuse.so.2
#4 0xb7f121f9 in ?? () from /lib/libfuse.so.2
#5 0xb7f15d05 in ?? () from /lib/libfuse.so.2
#6 0xb7f16d10 in ?? () from /lib/libfuse.so.2
#7 0xb7f18536 in fuse_session_process () from /lib/libfuse.so.2
#8 0xb7f148e5 in ?? () from /lib/libfuse.so.2
#9 0xb7cb84fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#10 0xb7c3ae5e in clone () from /lib/tls/i686/cmov/libc.so.6
Original comment by christia...@googlemail.com
on 12 Aug 2008 at 6:12
I have set NDEBUG=1 again to disable the assertions, since no assertion occured.
So it seems that either the pointer priv->hashtable or the pointer key points
to an
invalid location.
Original comment by christia...@googlemail.com
on 12 Aug 2008 at 6:19
gLib version is: 2.16.3-1 (Ubuntu)
What version are you using?
Original comment by christia...@googlemail.com
on 12 Aug 2008 at 6:29
I am unable to reproduce this problem on SUSE 10.0 32 bit. However, that may be
just
bad luck, especially if this is some sort of race condition.
In any case, here are some relevant versions:
s3backer-1.1.1
glib-1.2.10-595
fuse-2.7.0-5.1
kernel-default-2.6.13-15.18
Original comment by archie.c...@gmail.com
on 13 Aug 2008 at 4:15
Please try again with r217, which uses a new custom hash table implementation
instead
of glib's.
Original comment by archie.c...@gmail.com
on 13 Aug 2008 at 10:22
I have checked out r218 which is working without any problems so far.
Thanks for implementing the custom hash table.
Original comment by christia...@googlemail.com
on 14 Aug 2008 at 8:34
Marking bug as fixed. Please Re-open if the problem reoccurs.
Original comment by archie.c...@gmail.com
on 15 Aug 2008 at 4:18
Original comment by archie.c...@gmail.com
on 15 Aug 2008 at 4:18
Original comment by archie.c...@gmail.com
on 23 Oct 2008 at 4:42
Original issue reported on code.google.com by
christia...@googlemail.com
on 10 Aug 2008 at 9:15