protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.22k stars 15.44k forks source link

non-deterministic MapImplTest.RandomOrdering test failure #8645

Open Apteryks opened 3 years ago

Apteryks commented 3 years ago

What version of protobuf and what language are you using? Version: 3.17.0 Language: C++

What operating system (Linux, Windows, ...) and version? GNU Guix at commit bb325c5 branch with protobuf updated to 3.17.0.

What runtime / compiler are you using (e.g., python version or gcc version) dependencies: zlib@1.2.11, GCC 7.5.0 What did you do? Steps to reproduce the behavior:

  1. Do a build of protobuf (configure, make)
  2. Run the test suite (make check)
  3. Repeat 1. and 2. a couple times

Alternatively, via Guix (which can be installed on any GNU/Linux distribution):

guix build protobuf --with-source=protobuf="https://github.com/google/protobuf/releases/download/v3.17.0/protobuf-cpp-3.17.0.tar.gz" --rounds=20

The occurrence is rather low, I had a faulty build out of 10.

What did you expect to see The test suite passes all the time. What did you see instead?

[----------] Global test environment tear-down
[==========] 2263 tests from 200 test suites ran. (19714 ms total)
[  PASSED  ] 2262 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MapImplTest.RandomOrdering

 1 FAILED TEST
FAIL protobuf-test (exit status: 1)

The full log is attached. log.txt

Anything else we should know about your project / environment GNU Guix does the build in a minimal Linux container. The test suite was run in parallel with 24 jobs (make -j24).

fowles commented 3 years ago

I don't know anything about GNU guix, but is it possible that it does not support RDTSC and ASLR? In particular, we use these to try and add a bit of randomness to the hash table.

https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/map.h#L1081-L1093

Another way to improve this test is to pull the inner map outside the loop so that pointers will change more.

https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/map_test.cc#L1085

Would you mind testing a change for that?

bool MapOrderingIsRandom(int a, int b) {
  bool saw_a_first = false;
  bool saw_b_first = false;
  std::vector<Map<int32, int32>> v;
  v.reserve(50);
  for (int i = 0; i < 50; ++i) {
    Map<int32, int32>& m = v.emplace_back();
    m[a] = 0;
    m[b] = 0;
    int32 first_element = m.begin()->first;
    if (first_element == a) saw_a_first = true;
    if (first_element == b) saw_b_first = true;
    if (saw_a_first && saw_b_first) {
      return true;
    }
  }
  return false;
}
Apteryks commented 3 years ago

@fowles Hi, and thanks for the reply.

What would be a good way to determine if RDTSC/ASLR is supported by my system? Any output to look for in the build log of protobuf?

I wouldn't mind testing the change, but the problem is that it seems very difficult to trigger in the first place (it appears to be a rare, non-deterministic issue). Not the best kind to debug.

h-vetinari commented 7 months ago

I regularly observe this failure on windows in our conda-forge infrastructure. Perhaps every 1-in-3 or 1-in-4 runs. Looks like:

[ RUN      ] MapImplTest.RandomOrdering
D:\bld\libprotobuf-split_1708422242031\work\src\google/protobuf/map_test.inc(1350): error: Value of: MapOrderingIsRandom(i, j)
  Actual: false
Expected: true
Map with keys 1 and 9 has deterministic ordering

[  FAILED  ] MapImplTest.RandomOrdering (0 ms)
[ RUN      ] MapImplTest.TransparentLookupForString

Edit: of course, now that I guesstimate a frequency, I get it 4 out of 4 times in a row... 🤦

adamnovak commented 6 months ago

It sounds like the test for whether nondeterministic ordering is observed might be failing in environments where the goal is to have a deterministic build and prevent nondeterministic behavior. The clock in particular might be stubbed out.

The fix might be to make the seed function rely on something other than the clock and the object address hash? Maybe the C random number API?

There should be 50 different addresses being used in the test, and they shouldn't all roll the same bit as to how to order the keys, though. So maybe something is wrong with the pointer hash function from Abseil?

github-actions[bot] commented 3 months ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

h-vetinari commented 3 months ago

Not stale

github-actions[bot] commented 3 weeks ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

h-vetinari commented 3 weeks ago

Not stale