syndicate-storage / syndicate

Internet-scale software-defined storage system
Apache License 2.0
56 stars 10 forks source link

UG crashed after the latest git pull. #21

Closed wathsalav closed 11 years ago

wathsalav commented 11 years ago

UG crashes when I try to read from a file in the mounted volume, give below is the complete error Iog. Before the git pull it worked fine.

[build/out/UG/fs/consistency.cpp:0356] fs_entry_revalidate_path: check '/abc' [build/out/UG/fs/consistency.cpp:0059] fs_entry_is_read_stale: 1090 millis old, max is 360000 [build/out/UG/fs/consistency.cpp:0366] fs_entry_revalidate_path: fresh; no need to synchronize '/abc' DATA MS revalidate 0.000165 syndicatefs_getattr rc = 0 syndicatefs_open( /abc, 0x7ff746778c80 (flags = 100000) ) [build/out/UG/fs/consistency.cpp:0356] fs_entry_revalidate_path: check '/abc' [build/out/UG/fs/consistency.cpp:0059] fs_entry_is_read_stale: 1090 millis old, max is 360000 [build/out/UG/fs/consistency.cpp:0366] fs_entry_revalidate_path: fresh; no need to synchronize '/abc' DATA MS revalidate 0.000106 [build/out/UG/fs/network.cpp:0043] fs_entry_download_cached: md_download_file(http://localhost:9000/SYNDICATE-DATA//abc.1/manifest.1371825250.0) HTTP status 0 terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_S_construct null not valid Aborted (core dumped)

jcnelson commented 11 years ago

Looks like it couldn't read the manifest. I'll add better error checking for that tonight (that can happen whenever you try to pack or unpack a corrupt or incomplete protobuf). --Sent from my Nexus 4

Wathsala Vithanage notifications@github.com wrote:

UG crashes when I try to read from a file in the mounted volume, give below is the complete error Iog. Before the git pull it worked fine.

[build/out/UG/fs/consistency.cpp:0356] fs_entry_revalidate_path: check '/abc' [build/out/UG/fs/consistency.cpp:0059] fs_entry_is_read_stale: 1090 millis old, max is 360000 [build/out/UG/fs/consistency.cpp:0366] fs_entry_revalidate_path: fresh; no need to synchronize '/abc' DATA MS revalidate 0.000165 syndicatefs_getattr rc = 0 syndicatefs_open( /abc, 0x7ff746778c80 (flags = 100000) ) [build/out/UG/fs/consistency.cpp:0356] fs_entry_revalidate_path: check '/abc' [build/out/UG/fs/consistency.cpp:0059] fs_entry_is_read_stale: 1090 millis old, max is 360000 [build/out/UG/fs/consistency.cpp:0366] fs_entry_revalidate_path: fresh; no need to synchronize '/abc' DATA MS revalidate 0.000106 [build/out/UG/fs/network.cpp:0043] fs_entry_download_cached: md_download_file(http://localhost:9000/SYNDICATE-DATA//abc.1/manifest.1371825250.0) HTTP status 0 terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_S_construct null not valid Aborted (core dumped)

— Reply to this email directly or view it on GitHub.

jcnelson commented 11 years ago

Should be fixed in 2219f4. Still need to do correct exception handling, but at least these exceptions should stop.

wathsalav commented 11 years ago

With the latest git pull both UG and AG crashes. UG with the same error... AG crashes due to segfault inside libsyndicate. Here is the Valgrind output.

[build/out/libsyndicate/libsyndicate.cpp:2833] md_HTTP_connection_handler: GET /SYNDICATE-DATA/abc.1/manifest.1371825250.0, query=(null), requester=localhost:9000, user=NONE ==7517== Thread 2: ==7517== Invalid read of size 8 ==7517== at 0x4EBB2A9: gateway_HTTP_connect(md_HTTP_connectiondata) (libgateway.cpp:285) ==7517== by 0x4EA81EB: md_HTTP_connectionhandler(void, MHDConnection, char const, char const, char const, char const, unsigned long_, void) (libsyndicate.cpp:2837) ==7517== by 0x650C9E8: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x650D047: MHD_connection_handle_idle (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510CAC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510EEC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x5CFDE99: start_thread (pthread_create.c:308) ==7517== by 0x54D7CCC: clone (clone.S:112) ==7517== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==7517== ==7517== ==7517== Process terminating with default action of signal 11 (SIGSEGV) ==7517== Access not within mapped region at address 0x8 ==7517== at 0x4EBB2A9: gateway_HTTP_connect(md_HTTP_connectiondata) (libgateway.cpp:285) ==7517== by 0x4EA81EB: md_HTTP_connectionhandler(void, MHDConnection, char const, char const, char const, char const, unsigned long_, void) (libsyndicate.cpp:2837) ==7517== by 0x650C9E8: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x650D047: MHD_connection_handle_idle (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510CAC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510EEC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x5CFDE99: start_thread (pthread_create.c:308) ==7517== by 0x54D7CCC: clone (clone.S:112) ==7517== If you believe this happened as a result of a stack ==7517== overflow in your program's main thread (unlikely but ==7517== possible), you can try to increase the size of the ==7517== main thread stack using the --main-stacksize= flag. ==7517== The main thread stack size used in this run was 8388608. ==7517== ==7517== HEAP SUMMARY: ==7517== in use at exit: 334,003 bytes in 4,914 blocks ==7517== total heap usage: 8,714 allocs, 3,800 frees, 1,485,194 bytes allocated ==7517== ==7517== LEAK SUMMARY: ==7517== definitely lost: 524 bytes in 43 blocks ==7517== indirectly lost: 260 bytes in 11 blocks ==7517== possibly lost: 86,983 bytes in 367 blocks ==7517== still reachable: 246,236 bytes in 4,493 blocks ==7517== suppressed: 0 bytes in 0 blocks ==7517== Rerun with --leak-check=full to see details of leaked memory ==7517== ==7517== For counts of detected and suppressed errors, rerun with: -v ==7517== Use --track-origins=yes to see where uninitialised values come from ==7517== ERROR SUMMARY: 100 errors from 7 contexts (suppressed: 2 from 2) Killed

johnwhelchel commented 11 years ago

What command are you running that causes it?

wathsalav commented 11 years ago

cat in this case cat on file named abc on a mounted volume. This is not AG-SQL but AG-disk wich worked fine earlier. I think this has to do something with revision 2219f4.

jcnelson commented 11 years ago

The UG crashes because it tries to read a manifest from the AG, but the AG closes the connection. Right now, the UG does not handle cases where it receives invalid/corrupt messages.

I just committed the fix to the AG. Can you give it a try?

I'll work on the UG next.

-Jude

----- Original Message ----- From: "Wathsala Vithanage" notifications@github.com To: "jcnelson/syndicate" syndicate@noreply.github.com Cc: "Jude Nelson" jcnelson@CS.Princeton.EDU Sent: Tuesday, July 9, 2013 9:37:41 AM Subject: Re: [syndicate] UG crashed after the latest git pull. (#21)

With the latest git pull both UG and AG crashes. UG with the same error... AG crashes due to segfault inside libsyndicate. Here is the Valgrind output.

[build/out/libsyndicate/libsyndicate.cpp:2833] md_HTTP_connection_handler: GET /SYNDICATE-DATA/abc.1/manifest.1371825250.0, query=(null), requester=localhost:9000, user=NONE ==7517== Thread 2: ==7517== Invalid read of size 8 ==7517== at 0x4EBB2A9: gateway_HTTP_connect(md_HTTP_connectiondata) (libgateway.cpp:285) ==7517== by 0x4EA81EB: md_HTTP_connectionhandler(void, MHDConnection, char const, char const, char const, char const, unsigned long_, void) (libsyndicate.cpp:2837) ==7517== by 0x650C9E8: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x650D047: MHD_connection_handle_idle (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510CAC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510EEC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x5CFDE99: start_thread (pthread_create.c:308) ==7517== by 0x54D7CCC: clone (clone.S:112) ==7517== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==7517== ==7517== ==7517== Process terminating with default action of signal 11 (SIGSEGV) ==7517== Access not within mapped region at address 0x8 ==7517== at 0x4EBB2A9: gateway_HTTP_connect(md_HTTP_connectiondata) (libgateway.cpp:285) ==7517== by 0x4EA81EB: md_HTTP_connectionhandler(void, MHDConnection, char const, char const, char const, char const, unsigned long_, void) (libsyndicate.cpp:2837) ==7517== by 0x650C9E8: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x650D047: MHD_connection_handle_idle (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510CAC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x6510EEC: ??? (in /usr/lib/x86_64-linux-gnu/libmicrohttpd.so.10.15.0) ==7517== by 0x5CFDE99: start_thread (pthread_create.c:308) ==7517== by 0x54D7CCC: clone (clone.S:112) ==7517== If you believe this happened as a result of a stack ==7517== overflow in your program's main thread (unlikely but ==7517== possible), you can try to increase the size of the ==7517== main thread stack using the --main-stacksize= flag. ==7517== The main thread stack size used in this run was 8388608. ==7517== ==7517== HEAP SUMMARY: ==7517== in use at exit: 334,003 bytes in 4,914 blocks ==7517== total heap usage: 8,714 allocs, 3,800 frees, 1,485,194 bytes allocated ==7517== ==7517== LEAK SUMMARY: ==7517== definitely lost: 524 bytes in 43 blocks ==7517== indirectly lost: 260 bytes in 11 blocks ==7517== possibly lost: 86,983 bytes in 367 blocks ==7517== still reachable: 246,236 bytes in 4,493 blocks ==7517== suppressed: 0 bytes in 0 blocks ==7517== Rerun with --leak-check=full to see details of leaked memory ==7517== ==7517== For counts of detected and suppressed errors, rerun with: -v ==7517== Use --track-origins=yes to see where uninitialised values come from ==7517== ERROR SUMMARY: 100 errors from 7 contexts (suppressed: 2 from 2) Killed

— Reply to this email directly or view it on GitHub .

jcnelson commented 11 years ago

The UG should now catch protobuf parsing exceptions.

wathsalav commented 11 years ago

Thanks. Now it works fine!