remis-thoughts / native-hdfs-fuse

C HDFS FUSE implementation, no libhdfs
Apache License 2.0
87 stars 14 forks source link

EINTR handling and HDFS 2.3 support #2

Open tailhook opened 10 years ago

tailhook commented 10 years ago

I'm probably going to submit two bugs in one here, but I'm not sure, so let me know if I need to split it.

When I'm doing write to the file in HDFS, it hangs until HDFS timeout reached. Presumably it hangs with the following stack (but see below):

#0  0x00007f9f0c5cd923 in recvfrom () from /nix/store/ywxpkmy9kagcsvbjjhi46pr4xwpd6sfm-glibc-2.19/lib/libpthread.so.0
#1  0x00000000004158bd in hadoop_rpc_receive_proto ()
#2  0x0000000000415a26 in hadoop_rpc_call_datanode ()
#3  0x00000000004121f3 in hadoop_fuse_write_block ()
#4  0x00000000004127df in hadoop_fuse_do_write ()
#5  0x0000000000414313 in hadoop_fuse_write ()
#6  0x00007f9f0c9fa81d in fuse_fs_write_buf () from /nix/store/zx1ihwpqmjyr4s0a8ig8x6p4xkxjkk7i-fuse-2.9.3/lib/libfuse.so.2
#7  0x00007f9f0c9fa998 in fuse_lib_write_buf () from /nix/store/zx1ihwpqmjyr4s0a8ig8x6p4xkxjkk7i-fuse-2.9.3/lib/libfuse.so.2
#8  0x00007f9f0ca02f58 in fuse_ll_process_buf () from /nix/store/zx1ihwpqmjyr4s0a8ig8x6p4xkxjkk7i-fuse-2.9.3/lib/libfuse.so.2
#9  0x00007f9f0c9ffb49 in fuse_do_work () from /nix/store/zx1ihwpqmjyr4s0a8ig8x6p4xkxjkk7i-fuse-2.9.3/lib/libfuse.so.2
#10 0x00007f9f0c5c5f8a in start_thread () from /nix/store/ywxpkmy9kagcsvbjjhi46pr4xwpd6sfm-glibc-2.19/lib/libpthread.so.0
#11 0x00007f9f0c2fd47d in clone () from /nix/store/ywxpkmy9kagcsvbjjhi46pr4xwpd6sfm-glibc-2.19/lib/libc.so.6

But if I attach with debugger or strace, the write succeeds (i.e. when debugger is attached process receives EINTR from systemcalls usually).

It seems that native-hdfs-fuse doesn't handle EINTR in any way, and it's propagated just like any other error. But data is already in hdfs daemon, so it's just saved as is.

Unfortunately I haven't found any documentation for hdfs protocol so I don't know how to debug it further. I'm using HDFS from cloudera ubuntu repo, which has version of 2.3.0+cdh5.1.2+816-1.cdh5.1.2.p0.3~precise-cdh5.1.2

tailhook commented 10 years ago

Just repoduced same error on hadoop 2.5