openbmc / phosphor-dbus-interfaces

YAML descriptors of standard dbus interfaces
Apache License 2.0
39 stars 65 forks source link

LinkTimeOptimization (-flto) causes unit tests issues #2

Closed mine260309 closed 6 years ago

mine260309 commented 6 years ago

With LinkTimeOptimization enabled in this repo, it's found that the unit test in phosphor-time-manager is getting trouble, with different behaviors on different machines.

  1. On CI, the unit test fails to build, with below error:
    09:24:33 /usr/bin/x86_64-linux-gnu-ld: test-TestBmcEpoch.o: undefined reference to symbol '_ZN7testing8internal18g_linked_ptr_mutexE'

    see https://gerrit.openbmc-project.xyz/#/c/openbmc/phosphor-time-manager/+/12340/ for details.

  2. On local x86 machine, the unit test is OK to build, but gets test error on case TestUtil.modeToStr(), which expects an exception to be thrown when calling with an invalid enum value. A bit debugging shows the problem is in below function:
    1 std::string convertForMessage(Synchronization::Method v)
    2 {
    3     auto i = std::find_if(
    4             std::begin(mappingSynchronizationMethod),
    5             std::end(mappingSynchronizationMethod),
    6             [v](auto& e){ return v == std::get<1>(e); });
    7     return std::get<0>(*i);
    8 }
    • When v is invalid, the expected behavior is it throws std::logic_error on std::get<0>(*i) where i is the end() of mappingSynchronizationMethod, and thus it tries to construct a std::string from nullptr. (basic_string::_M_construct null not valid).
    • However, on my local x86 machine, i becomes an iterator pointing to the begin() of mappingOwnerOwners, and thus it returns xyz.openbmc_project.Time.Owner.Owners.BMC!
      (gdb) p i
      $2 = (const std::tuple *) 0x7ffff7dcefa0 <sdbusplus::xyz::openbmc_project::Time::server::(anonymous namespace)::mappingOwnerOwners>

      After turning off -flto in phosphor-dbus-interfaces, the issue is gone.

geissonator commented 6 years ago

@edtanous you see anything like this when you got bmcweb to work with -flto?

edtanous commented 6 years ago

No, we didn't see this, but technically you're relying on undefined behavior in the code snippet. Secure coding guidelines and undefined behavior checks would say that you need to have a check like: if (i == std::end(mappginSyncronizationMethod)){ throw std::logic_error; }

Because you don't have that check before dereferencing i, technically the compiler is allowed to do whatever it wants, up to and including just returning the first string in the list. The fact that on x86 std::end happens to throw on access, and on Arm std::end gets you a strange case is expected, especially at high optimization levels.

The link time error looks like you're trying to link against the version of gtest which is compiled on your system without LTO. The symbol it's missing looks like this: https://github.com/google/googletest/blob/master/googletest/include/gtest/internal/gtest-linked_ptr.h#L80

I suspect the magic you're missing is here: https://github.com/openbmc/bmcweb/blob/master/CMakeLists.txt#L69

Essentially, this is swapping out AR with gcc-ar, which is a magic wrapper around ar that.... does some things that I don't fully understand, but it seems to work.

Similar stack overflow question: https://stackoverflow.com/questions/25878407/how-can-i-use-lto-with-static-libraries

mine260309 commented 6 years ago

@edtanous Thanks for the information, it helps a lot. Btw, could you kindly help to check why another build error occurs for https://gerrit.openbmc-project.xyz/#/c/openbmc/phosphor-time-manager/+/12379/1 (log at https://openpower.xyz/job/openbmc-repository-ci/16086/console)? It's indicating it's trying to link a constexpr phosphor::time::Manager::PROPERTY_TIME_MODE, which is ODR-used, so it looks weird.

So to enable LTO in my case:

  1. gtest/gmock should be compiled with LTO;
  2. The code in convertForMessage() shall be fixed;
  3. The gcc-ar and gcc-randlib related issue only occurs locally, but not on CI. So it's ok for now, but it's better to be fixed later.

The main problem is for convertForMessage(), which is generated by sdbusplus, so we need to fit it there.

edtanous commented 6 years ago

I'm not certain if #1 is required, but #2 is for sure. For #3, what distro are you compiling with? If you have an old enough compiler, you might be hitting other issues.

mine260309 commented 6 years ago

For 1, the issue probably is related to the incorrect flags, it is fixed by https://gerrit.openbmc-project.xyz/#/c/openbmc/phosphor-time-manager/+/12612/ For 2, it is already merged at https://github.com/openbmc/sdbusplus/commit/b641d103ad1b8b2fc946625307674600d44d08f6 For 3, CI does not have the issue, ignore for now.

The remaining issue is that time-manager gets link issue with constexpr, but it's not related to phosphor-dbus-interfaces anymore.

So this issue could be closed.