Closed soplakanets closed 9 years ago
It's worth mentioning that same issue happened on previous 2.1.1 release, I upgraded in hope of this being fixed in 2.1.2.
Well, it looks like this screwed up the table. Now any query on the table crashes server with same error.
I would like to mention that this same error happened to me a few days ago, I was running rethinkdb 2.1.1 (CLANG 4.2 (clang-425.0.28))
on OS X. However, this only happened when I ran rethinkdb dump
and the server would crash with this error:
error: Error in src/rdb_protocol/lazy_json.cc at line 22:
error: Guarantee failed: [res == archive_result_t::SUCCESS] Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR.
error: Backtrace:
error: Sun Aug 23 21:36:35 2015
1: 0 rethinkdb 0x0000000105d1dd2b _Z11deserializeIL17cluster_version_t6EE16archive_result_tP13read_stream_tP17repli_timestamp_t + 539 at 0x105d1dd2b ()
2: 0 rethinkdb 0x000000010588192c _ZN26callable_action_instance_tINSt3__16__bindIRFvNS0_8functionIFvvEEEEJRS4_EEEED0Ev + 3724 at 0x10588192c ()
3: 0 rethinkdb 0x0000000105882167 _ZN26callable_action_instance_tINSt3__16__bindIRFvNS0_8functionIFvvEEEEJRS4_EEEED0Ev + 5831 at 0x105882167 ()
4: 0 rethinkdb 0x000000010588154a _ZN26callable_action_instance_tINSt3__16__bindIRFvNS0_8functionIFvvEEEEJRS4_EEEED0Ev + 2730 at 0x10588154a ()
5: 0 rethinkdb 0x0000000105ae67ec _Z11deserializeIL17cluster_version_t6EE16archive_result_tP13read_stream_tP13name_string_t + 8204 at 0x105ae67ec ()
6: 0 rethinkdb 0x0000000105c17107 _ZNSt3__16vectorINS0_I15lon_lat_point_tNS_9allocatorIS1_EEEENS2_IS4_EEE21__push_back_slow_pathIS4_EEvOT_ + 1047 at 0x105c17107 ()
7: 0 rethinkdb 0x0000000105c17267 _ZNSt3__16vectorINS0_I15lon_lat_point_tNS_9allocatorIS1_EEEENS2_IS4_EEE21__push_back_slow_pathIS4_EEvOT_ + 1399 at 0x105c17267 ()
8: 0 rethinkdb 0x0000000105b4325b _ZN2ql11deserializeIL17cluster_version_t6EEE16archive_result_tP13read_stream_tPNS_11batchspec_tE + 16523 at 0x105b4325b ()
9: 0 rethinkdb 0x0000000105891e84 _ZN30concurrent_traversal_adapter_t16handle_pair_coroEP18scoped_key_value_tP15semaphore_acq_t27fifo_enforcer_write_token_tN14auto_drainer_t6lock_tE + 116 at 0x105891e84 ()
10: 0 rethinkdb 0x0000000105891faa _ZN26callable_action_instance_tINSt3__16__bindIM30concurrent_traversal_adapter_tFvP18scoped_key_value_tP15semaphore_acq_t27fifo_enforcer_write_token_tN14auto_drainer_t6lock_tEEJPS2_S4_S6_RS7_S9_EEEE10run_actionEv + 106 at 0x105891faa ()
11: 0 rethinkdb 0x0000000105877c68 _ZN22threaded_context_ref_tC2Ev + 1304 at 0x105877c68 ()
error: Exiting.
Trace/BPT trap: 5
when the dumping reached 14%, and the same result occurred as I tried again 3 times.
I decided to try to increase the --cache-size
manually to 4 GB, and close any memory intensive applications, which seemed to have fixed the problem as rethinkdb dump
executed successfully. Interestingly, when I removed the --cache-size
flag and tried dumping again, the error didn't occur, and hasn't occurred since.
This is very concerning for obvious reasons, and we would like to find out how this happened as quickly as possible.
@soplakanets It would be extremely helpful if we could get a copy of your data files. We are happy to sign an NDA if there's any sensitive data in there.
I recommend taking the following steps:
r.db(...).table(...).config()
shows you the table id in the id
field of the resultmetadata
and the file that has the id of the affected table as its filename.table.delete()
when this happened, I assume this is fine?)I'll send you a separate email with information on how to best upload the data files to us.
@1lann Thanks for adding your report. Were you storing geospatial data in one of your tables?
It would be extremely helpful if we could get a copy of your data files. We are happy to sign an NDA if there's any sensitive data in there.
No need for NDA, this is an auto-generated data which doesn't make sense without context. The problem is that I had to drop the table in order to have it working again. I will try to run the data generation script in same order to reproduce it again.
I recommend taking the following steps
As I said above, doing dropTable()
helped recover from the issue. Thanks for your help.
@danielmewes No I did not store any geospatial data whatsoever. I had a total of 3 tables, they contained ~5k, ~100k, ~2M documents each. They were very basic fields with numbers and strings. In the table with 5k documents, one of the fields contained an array with 10 elements. In the same 5k table, I used a compound secondary key consisting of 2 fields, and a few other basic secondary keys.
@danielmewes Unfortunately, I have same issue again. I will send you the data you requested via email.
Update: sent the data via hightail.
Thank you for the data @soplakanets! We'll investigate this asap.
Quick update: We've analyzed the data file and have derived a unit test that can reproduce a leaf node corruption. We're still investigating the details of the problem.
This turned out to be a really old bug in the btree code, that could cause memory and/or data corruption in certain (usually rare) corner cases. It's quite embarrassing that we've had this serious bug around for such a long time. At the same time I'm glad that we found it, thanks to your help @soplakanets .
It's possible that some fixes to our leaf node timestamp handling in RethinkDB 2.1 have made the problem surface more easily.
We will push out a bug fix release as quickly as possible.
A fix is implemented in branch daniel_4769, and is now in code review 3210 by @Tryneus .
Fixed in v2.1.x
by commits 65572886e307231466813d80572872237b09a2db and 2f8aef3b0f7213364e59a5042b7761060d604548 .
Fixed in next
by 5596db1 and c968442.
The fix will ship with RethinkDB 2.1.3.
Also cherry-picked the first fix into v2.0.x
with 7bd1db8f6cb9e6b60d357654d17ba506ba57dc80 , in case we release another 2.0.x point release at some point.
The likely fix for this shipped with RethinkDB 2.1.3. We strongly recommend everyone to upgrade as quickly as possible, to avoid potential data corruption problems.
@soplakanets It's always a bit difficult with these hard-to-reproduce bugs to know for sure whether a given patch actually fixed the issue. I recommend upgrading to 2.1.3, but to still keep an eye on the issue to make sure it doesn't appear again (your workload seems to be particularly prone to exposing this bug).
@dalanmiller Thanks. Will upgrade asap.
Hi guys,
First I would like to thank you for the amazing work you do on RethinkDB! I love it!
Unfortunately I have RethinkDB 2.1.3 and I have a similar bug.
I have a cluster of 2 servers but the table I have a problem with is only on one server. When I try to delete all the datas (~2M documents) of this table, the server crash. There is no other access on this table during the crash.
r.db('test').table('maxMindBlocksIPv4').delete()
2015-09-13T15:28:19.198159460 197.522587s error: Error in src/arch/runtime/thread_pool.cc at line 359:
2015-09-13T15:28:19.264949298 197.589377s error: Segmentation fault from reading the address 0x7f018563ff76.
2015-09-13T15:28:19.302892953 197.627320s error: Backtrace:
2015-09-13T15:28:20.187061917 198.511491s error: Sun Sep 13 15:28:19 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_error(char const*, int, char const*, ...) at ??:?\n4: linux_thread_pool_t::fatal_signal_handler(int, siginfo_t*, void*) at ??:?\n5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f02eea29340] at 0x7f02eea29340 (/lib/x86_64-linux-gnu/libpthread.so.0)\n6: /lib/x86_64-linux-gnu/libc.so.6(+0x1524d4) [0x7f02ee7a54d4] at 0x7f02ee7a54d4 (/lib/x86_64-linux-gnu/libc.so.6)\n7: leaf::garbage_collect(value_sizer_t*, leaf_node_t*, int, int*, boost::optional<int>) at ??:?\n8: leaf::move_elements(value_sizer_t*, leaf_node_t*, int, int, int, leaf_node_t*, int, int, std::vector<void const*, std::allocator<void const*> >*) at ??:?\n9: leaf::level(value_sizer_t*, int, leaf_node_t*, leaf_node_t*, btree_key_t*, std::vector<void const*, std::allocator<void const*> >*) at ??:?\n10: check_and_handle_underfull(value_sizer_t*, buf_lock_t*, buf_lock_t*, superblock_t*, btree_key_t const*, value_deleter_t const*) at ??:?\n11: apply_keyvalue_change(value_sizer_t*, keyvalue_location_t*, btree_key_t const*, repli_timestamp_t, value_deleter_t const*, key_modification_callback_t*, delete_mode_t) at ??:?\n12: kv_location_delete(keyvalue_location_t*, store_key_t const&, repli_timestamp_t, deletion_context_t const*, delete_mode_t, rdb_modification_info_t*) at ??:?\n13: rdb_replace_and_return_superblock(btree_loc_info_t const&, btree_point_replacer_t const*, deletion_context_t const*, promise_t<superblock_t*>*, rdb_modification_info_t*, profile::trace_t*) at ??:?\n14: do_a_replace_from_batched_replace(auto_drainer_t::lock_t, fifo_enforcer_sink_t*, fifo_enforcer_write_token_t const&, btree_loc_info_t const&, one_replace_t, ql::configured_limits_t const&, promise_t<superblock_t*>*, rdb_modification_report_cb_t*, bool, ql::datum_t*, profile::trace_t*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*) at ??:?\n15: std::_Function_handler<void (), std::_Bind<void (*(auto_drainer_t::lock_t, fifo_enforcer_sink_t*, fifo_enforcer_write_token_t, btree_loc_info_t, one_replace_t, ql::configured_limits_t, promise_t<superblock_t*>*, rdb_modification_report_cb_t*, bool, ql::datum_t*, profile::trace_t*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*))(auto_drainer_t::lock_t, fifo_enforcer_sink_t*, fifo_enforcer_write_token_t const&, btree_loc_info_t const&, one_replace_t, ql::configured_limits_t const&, promise_t<superblock_t*>*, rdb_modification_report_cb_t*, bool, ql::datum_t*, profile::trace_t*, std::set<std::string, std::less<std::string>, std::allocator<std::string> >*)> >::_M_invoke(std::_Any_data const&) at ??:?\n16: coro_pool_t<std::function<void ()> >::worker_run(std::function<void ()>, auto_drainer_t::lock_t) at ??:?\n17: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (coro_pool_t<std::function<void ()> >::*)(std::function<void ()>, auto_drainer_t::lock_t)> (coro_pool_t<std::function<void ()> >*, std::function<void ()>, auto_drainer_t::lock_t)> >::run_action() at ??:?\n18: coro_t::run() at ??:?
2015-09-13T15:28:20.248535860 198.572963s error: Exiting.
An other example:
2015-09-13T15:50:39.321047629 1298.095524s error: Error in src/buffer_cache/alt.cc at line 718:
2015-09-13T15:50:39.360596908 1298.135073s error: Guarantee failed: [ret != repli_timestamp_t::invalid]
2015-09-13T15:50:39.398113181 1298.172589s error: Backtrace:
2015-09-13T15:50:40.017122232 1298.791599s error: Sun Sep 13 15:50:39 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_error(char const*, int, char const*, ...) at ??:?\n4: buf_lock_t::get_recency() const at ??:?\n5: btree_depth_first_traversal(counted_t<counted_buf_lock_and_read_t>, key_range_t const&, depth_first_traversal_callback_t*, access_t, direction_t, btree_key_t const*, btree_key_t const*, signal_t*) at ??:?\n6: btree_depth_first_traversal(counted_t<counted_buf_lock_and_read_t>, key_range_t const&, depth_first_traversal_callback_t*, access_t, direction_t, btree_key_t const*, btree_key_t const*, signal_t*) at ??:?\n7: btree_depth_first_traversal(counted_t<counted_buf_lock_and_read_t>, key_range_t const&, depth_first_traversal_callback_t*, access_t, direction_t, btree_key_t const*, btree_key_t const*, signal_t*) at ??:?\n8: btree_depth_first_traversal(counted_t<counted_buf_lock_and_read_t>, key_range_t const&, depth_first_traversal_callback_t*, access_t, direction_t, btree_key_t const*, btree_key_t const*, signal_t*) at ??:?\n9: btree_depth_first_traversal(superblock_t*, key_range_t const&, depth_first_traversal_callback_t*, access_t, direction_t, release_superblock_t, signal_t*) at ??:?\n10: btree_concurrent_traversal(superblock_t*, key_range_t const&, concurrent_traversal_callback_t*, direction_t, release_superblock_t) at ??:?\n11: rdb_rget_slice(btree_slice_t*, key_range_t const&, superblock_t*, ql::env_t*, ql::batchspec_t const&, std::vector<boost::variant<ql::map_wire_func_t, ql::group_wire_func_t, ql::filter_wire_func_t, ql::concatmap_wire_func_t, ql::distinct_wire_func_t, ql::zip_wire_func_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::allocator<boost::variant<ql::map_wire_func_t, ql::group_wire_func_t, ql::filter_wire_func_t, ql::concatmap_wire_func_t, ql::distinct_wire_func_t, ql::zip_wire_func_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > const&, boost::optional<boost::variant<ql::count_wire_func_t, ql::sum_wire_func_t, ql::avg_wire_func_t, ql::min_wire_func_t, ql::max_wire_func_t, ql::reduce_wire_func_t, ql::limit_read_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > const&, sorting_t, rget_read_response_t*, release_superblock_t) at ??:?\n12: do_read(ql::env_t*, store_t*, btree_slice_t*, real_superblock_t*, rget_read_t const&, rget_read_response_t*, release_superblock_t) at ??:?\n13: rdb_read_visitor_t::operator()(rget_read_t const&) at ??:?\n14: store_t::protocol_read(read_t const&, read_response_t*, real_superblock_t*, signal_t*) at ??:?\n15: store_t::read(read_t const&, read_response_t*, read_token_t*, signal_t*) at ??:?\n16: replica_t::do_read(read_t const&, state_timestamp_t, signal_t*, read_response_t*) at ??:?\n17: primary_dispatcher_t::read(read_t const&, fifo_enforcer_sink_t::exit_read_t*, order_token_t, signal_t*, read_response_t*) at ??:?\n18: primary_execution_t::on_read(read_t const&, fifo_enforcer_sink_t::exit_read_t*, order_token_t, signal_t*, read_response_t*, admin_err_t*) at ??:?\n19: primary_query_server_t::client_t::perform_request(boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, signal_t*) at ??:?\n20: multi_client_server_t<boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, primary_query_server_t*, primary_query_server_t::client_t>::client_t::on_request(signal_t*, boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) at ??:?\n21: mailbox_t<void (boost::variant<primary_query_bcard_t::read_request_t, primary_query_bcard_t::write_request_t, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>)>::read_impl_t::read(read_stream_t*, signal_t*) at ??:?\n22: /usr/bin/rethinkdb() [0x944a02] at 0x944a02 ()\n23: coro_t::run() at ??:?
2015-09-13T15:50:40.069366756 1298.843842s error: Exiting.
r.db('test').table('maxMindBlocksIPv4').config():
{
"db": "test" ,
"durability": "hard" ,
"id": "5f369b90-ab69-4b8e-b2e1-a8d4209477c8" ,
"indexes": [
"toDelete"
] ,
"name": "maxMindBlocksIPv4" ,
"primary_key": "id" ,
"shards": [
{
"nonvoting_replicas": [ ],
"primary_replica": "s1" ,
"replicas": [
"s1"
]
}
] ,
"write_acks": "majority"
}
rethinkdb --version
rethinkdb 2.1.3~0trusty (GCC 4.8.2)
I run the server on Linux 3.16.3, Ubuntu 14.04.1 and had install RethinkDB from deb packages.
If you need more informations or a copy of the table, just let me know.
Thanks, Adrien
Hi Adrien (@Bacto), thanks for the report. Have you had this table before you upgraded to 2.1.3? If that's the case case, the crashes are probably consistent with the sort of corruption that this bug could have caused. The corruption would probably have happened before the upgrade, and only showed when you tried to delete all documents after upgrading to 2.1.3.
Could you make a copy of the rethinkdb data directory on the server that's experiencing the crash, after you've shut down the rethinkdb process on that server? I'd like to take a look at the data file to confirm that this isn't a different bug if possible. I can send you instructions on how to upload the data. Once you've made a copy, I recommend dropping the table and recreating it from scratch in order to avoid the crash in the future.
@Bacto One more question: Did the first crash leave a core
file behind by any chance?
Hi Daniel (@danielmewes),
Thank you for your fast reply.
I have deleted the original table yesterday and recreated it (on rethinkDB 2.1.3). Unfortunately I still have the problem. I've tried to delete datas and it seems to work:
r.db('test').table('maxMindBlocksIPv4').delete()
But it crashed with:
r.db('test').table('maxMindBlocksIPv4').update({ toDelete: true }).run({durability: 'soft'})
2015-09-15T09:52:37.580596923 151295.742160s error: Error in src/rdb_protocol/lazy_json.cc at line 22:
2015-09-15T09:52:37.620582133 151295.782145s error: Guarantee failed: [res == archive_result_t::SUCCESS] Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR.
2015-09-15T09:52:37.646836021 151295.808399s error: Backtrace:
2015-09-15T09:52:38.551113880 151296.712678s error: Tue Sep 15 09:52:37 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_error(char const*, int, char const*, ...) at ??:?\n4: get_data(rdb_value_t const*, buf_parent_t) at ??:?\n5: lazy_json_t::get() const at ??:?\n6: rget_cb_t::handle_pair(scoped_key_value_t&&, concurrent_traversal_fifo_enforcer_signal_t) at ??:?\n7: concurrent_traversal_adapter_t::handle_pair_coro(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t) at ??:?\n8: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (concurrent_traversal_adapter_t::*)(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> (concurrent_traversal_adapter_t*, scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> >::run_action() at ??:?\n9: coro_t::run() at ??:?
I don't have a core file but I can try to have one if you explain me how to generate it. I can send you a copy of the table too, just tell me how :)
Thanks, I'm going to send you an email in a moment with instructions on how to upload the data file.
It also sounds like you're able to reproduce the problem on a fresh table relatively quickly (given that you started with a new table just yesterday). What types of write operations other than the full-table delete()
do you run on the maxMindBlocksIPv4
table?
Re-opening this, since there still seems to be an issue.
@Bacto actually I don't have your email address. Would you mind dropping me a quick message to daniel@rethinkdb.com ? I will reply with upload instructions.
Thanks @danielmewes, I've just sent you an email.
You're lucky, I don't do a lot of operations on this table.
The script do this:
Flag all entries as "toDelete":
r.table('maxMindBlocksIPv4')
.update({ toDelete: true })
.run({durability: 'soft'});
Insert new entries by groups of 100.000 with a total at the end of ~2.5M entries:
r.table('maxMindBlocksIPv4')
.insert(toInsert)
.run({durability: 'soft'});
Delete all entries that was flagged as "toDelete":
r.table('maxMindBlocksIPv4')
.getAll(true, { index: 'toDelete' })
.delete()
.run({durability: 'soft'});
For information, I use rethinkdbdash on nodeJS. I have only one second index on "toDelete".
Thanks for the description of your workload @Bacto . How large are the documents approximately? Do you have an example document that you can share?
Ok I've been able to reproduce this. So there's definitely a second issue in the btree leaf node code. Have to investigate more to find the exact cause of the corruption.
Update: Testing a fix now. We should be able to get a bug fix release out in the next few days.
The fix is not in v2.1.x
and next
. Will ship with RethinkDB 2.1.4.
The internal code review number was 3226.
@Bacto We just pushed RethinkDB 2.1.4 to the repositories. Please let us know in case you still encounter any issues after the update.
Hi @danielmewes, I have just upgrade to 2.1.4 and I still have the issue :(
2015-09-17T09:34:21.922830350 320.014454s error: Error in src/rdb_protocol/lazy_json.cc at line 22:
2015-09-17T09:34:22.028834984 320.120459s error: Guarantee failed: [res == archive_result_t::SUCCESS] Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR.
2015-09-17T09:34:22.067034201 320.158659s error: Backtrace:
2015-09-17T09:34:22.367373396 320.458999s error: Thu Sep 17 09:34:22 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_error(char const*, int, char const*, ...) at ??:?\n4: get_data(rdb_value_t const*, buf_parent_t) at ??:?\n5: lazy_json_t::get() const at ??:?\n6: rget_cb_t::handle_pair(scoped_key_value_t&&, concurrent_traversal_fifo_enforcer_signal_t) at ??:?\n7: concurrent_traversal_adapter_t::handle_pair_coro(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t) at ??:?\n8: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (concurrent_traversal_adapter_t::*)(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> (concurrent_traversal_adapter_t*, scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> >::run_action() at ??:?\n9: coro_t::run() at ??:?
I'll send you the table in a few minutes.
I've sent you the table. For information, it crashed too when I tried to dump the table with:
rethinkdb dump -e test.maxMindBlocksIPv4
2015-09-17T09:38:34.331210534 213.217966s error: Error in src/rdb_protocol/lazy_json.cc at line 22:
2015-09-17T09:38:34.362081851 213.248837s error: Guarantee failed: [res == archive_result_t::SUCCESS] Deserialization of rdb value failed with error archive_result_t::RANGE_ERROR.
2015-09-17T09:38:34.389654434 213.276410s error: Backtrace:
2015-09-17T09:38:34.764616226 213.651373s error: Thu Sep 17 09:38:34 2015\n\n1: backtrace_t::backtrace_t() at ??:?\n2: format_backtrace(bool) at ??:?\n3: report_fatal_error(char const*, int, char const*, ...) at ??:?\n4: get_data(rdb_value_t const*, buf_parent_t) at ??:?\n5: lazy_json_t::get() const at ??:?\n6: rget_cb_t::handle_pair(scoped_key_value_t&&, concurrent_traversal_fifo_enforcer_signal_t) at ??:?\n7: concurrent_traversal_adapter_t::handle_pair_coro(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t) at ??:?\n8: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (concurrent_traversal_adapter_t::*)(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> (concurrent_traversal_adapter_t*, scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> >::run_action() at ??:?\n9: coro_t::run() at ??:?
@Bacto Is that with the existing table? Unfortunately the patch isn't usually able to revert corruptions that have already happened, and that found their way to disk. It appears that this is the case here. Unfortunately you'll have to recreate the table again.
With your data file, we can attempt to recover most of the data (I suspect that only a small number of documents will be corrupted). Please let me know if you had important data in the table that you would like to have recovered, and I'll see to get you the intact data by tomorrow.
My apologies again for all the inconvenience.
Oh ok I thought that the patch will correct the table.
I don't care about the table datas. I will destroy the table and recreate it and I'll tell you if it's working now :)
I'm also facing the segmentation fault here, with both 2.1.1
and 2.1.4
on OS X, fresh data,
to reproduce this, create a table "user", then:
r.table('users').indexCreate('time', r.row('createdAt'))
r.table('users').insert({name: 'aaa', createdAt: r.now()})
r.table('users').insert({name: 'bbb', createdAt: r.now()})
r.table('users').orderBy({index: r.desc('time')}).limit(2).changes()
r.table('users').filter({name: 'bbb'}).update({name: 'zzz'}) // this will produce the crash
here is the error message I got after the crash:
error: Error in src/arch/runtime/thread_pool.cc at line 359:
error: Segmentation fault from reading the address 0x8.
error: Backtrace:
error: Thu Sep 17 19:27:47 2015
error: Exiting.
EDIT:
just downloaded 2.0.5
and do the above steps again, it's working without problem.
Thanks!
@danielmewes I don't have the problem after destroying / creating the table. Thanks a lot! :-)
@oldlam Thanks for mentioning that and for the steps to reproduce. On Linux I'm not getting the seg fault with those queries, but the changefeed isn't properly working either. It looks like there's a bug that yields different types of failures on different systems.
I think this is a separete bug though. I copied your report over to https://github.com/rethinkdb/rethinkdb/issues/4850
After issuing following query `r.db('db').table('table').delete() RethinkDB server crashed almost instantly.
Log file records:
Single server setup. Server info: