Closed SvenLehmann closed 5 years ago
Can't reproduce the crash with 800k rows on nemea with gcc 8.2, neither in debug, nor in release. Asan is unhappy though:
==25203==WARNING: AddressSanitizer failed to allocate 0x4a817c80000 bytes
[...]
#6 0x7fce02b60b2a in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedb2a)
#7 0x55a1da1310d8 in std::vector<opossum::RowID, boost::container::pmr::polymorphic_allocator<opossum::RowID> >::reserve(unsigned long) (/home/Markus.Dreseler/hyrise/build-release-asan/hyrisePlayground+0xe7c80d8)
#8 0x55a1dbbdc39e in opossum::JoinIndex::_perform_join() (/home/Markus.Dreseler/hyrise/build-release-asan/hyrisePlayground+0x1027339e)
#9 0x55a1dbbe9150 in opossum::JoinIndex::_on_execute() (/home/Markus.Dreseler/hyrise/build-release-asan/hyrisePlayground+0x10280150)
#10 0x55a1da0d3a3a in opossum::AbstractReadOnlyOperator::_on_execute(std::shared_ptr<opossum::TransactionContext>) (/home/Markus.Dreseler/hyrise/build-release-asan/hyrisePlayground+0xe76aa3a)
#11 0x55a1da0bdbf9 in opossum::AbstractOperator::execute() (/home/Markus.Dreseler/hyrise/build-release-asan/hyrisePlayground+0xe754bf9)
#12 0x55a1d6b17bff in main (/home/Markus.Dreseler/hyrise/build-release-asan/hyrisePlayground+0xb1aebff)
#13 0x7fcdffc51b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#14 0x55a1d7ca1c19 in _start (/home/Markus.Dreseler/hyrise/build-release-asan/hyrisePlayground+0xc338c19)
Why is asan so unreliable? Your ASAN issue with the sort-merge join was only found on a Mac.
I would expect this to be a libstdc++ / libc++ issue.
Sven: is it reproducible and have you seen this issue with other indices as well?
Here is asan on a debug build:
#14 0x561553d0a0a6 in opossum::JoinIndex::_perform_join() ../src/lib/operators/join_index.cpp:73
Mmmmmh...
71 size_t worst_case = input_table_left()->row_count() * input_table_right()->row_count();
72
73 _pos_list_left->reserve(worst_case);
@Bouncner: You have some experience with estimating join results. Could you come up with something that is more sane?
800,000 tuples on both sides means we reserve a worst case of 5 TB (800,000 800,000 8/1000/1000/1000) if I am not mistaken.
I'd rather vote for something like max(N, min(left_table_size, right_table_size))
, assuming that each tuple of the smaller table has one single matching tuple. Just to avoid too many early reallocations, N could be something like 100.
Shall I create a PR with something like that?
Sounds good.
Sven: is it reproducible and have you seen this issue with other indices as well?
I don't think I've checked this with other indices.
There is a bug in the JoinIndex operator that occurs with GCC, but not with Clang.
Consider the following playground:
The same test runs fine with smaller tables, e.g., 500'000, but will fail for larger tables, e.g., 700'000 rows, as well. It seems like the relevant criteria is the number of join matches, which is here 600'000.
The stacktrace looks as follows:
I did not spend time on narrowing down the problem any further yet.