sccn / liblsl

C++ lsl library for multi-modal time-synched data transmission over the local network
Other
108 stars 63 forks source link

Add github actions build file for MinGW-W64 #102

Closed tobiasherzke closed 3 years ago

tobiasherzke commented 3 years ago

FYI, this is how we build liblsl on windows for the MinGW compiler. As liblsl is a C++ library, users who write applications that use liblsl need to have their liblsl compiled with the same compiler as their application.

The build file is derived from your cross-platform build file. Compare against that file to find the differences. I was not sure how to integrate the different default shell settings required for this build (msys2) and all other platforms (bash) in github actions.

I have disabled examples and unit-tests because I get a linker error in one of them. I do not remember which, I remember that it is possible to fix the linker command manually quite easily, but I do not know enough cmake to fix it in the build system.

This pull request is mainly FYI. I for one would very much like if you accept it or merge into your cross-platform build file, but your priorities may lie elsewhere.

tstenner commented 3 years ago

Thanks for the PR! Compatibility with MinGW is a good thing and once I've figured out how to mark CI targets as optional (i.e., only run them when explicitly requested) I'll add it.

As it is, it shouldn't be needed. Even though liblsl is a C++ library, the entire API/ABI has only plain C types / functions and the C++ include header calls the C functions behind the scenes so liblsl compiled with MSVC should be loadable from a program compiled with MinGW and vice versa.

I have disabled examples and unit-tests because I get a linker error in one of them.

Fixed in 7fcc1bfc8acbb14fa5ac7979a5c7f4f0fecdbacb.

I was not sure how to integrate the different default shell settings required for this build (msys2) and all other platforms (bash) in github actions.

For entire steps, you can add conditions, e.g. if: startsWith(matrix.config.os, 'ubuntu-') (see https://github.com/labstreaminglayer/AppTemplate_cpp_qt/blob/3b5eed70d10f3c45af3ce5a9645685e6ecfd81d4/.github/workflows/cppcmake.yml#L38 ), for short command blocks something like this: https://github.com/sccn/liblsl/blob/7fcc1bfc8acbb14fa5ac7979a5c7f4f0fecdbacb/.github/workflows/cppcmake.yml#L63-L67

tstenner commented 3 years ago

I tried integrating your changes into the main CI config, but the compiler / shell has an issue with a string definition passed via the CLI:

/C/ProgramData/chocolatey/bin/g++.exe -DBOOST_ALL_NO_LIB -DBOOST_ASIO_STANDALONE -DBOOST_THREAD_DONT_PROVIDE_INTERRUPTIONS -DLIBLSL_EXPORTS -DLSLNOAUTOLINK -D_WIN32_WINNT=0x0601 -Dlsl_EXPORTS -DLSL_LIBRARY_INFO_STR=\"git:c744dc95c47d60a9d8356ad4fc31f0126f6f44e6/branch:refs/heads/githubmingw/build:Release/compiler:GNU-8.1.0/link:SHARED\" @CMakeFiles/lsl.dir/includes_CXX.rsp -O3 -DNDEBUG -fvisibility=hidden -fno-keep-inline-dllexport -MD -MT CMakeFiles/lsl.dir/src/buildinfo.cpp.obj -MF CMakeFiles/lsl.dir/src/buildinfo.cpp.obj.d -o CMakeFiles/lsl.dir/src/buildinfo.cpp.obj -c /D/a/liblsl/liblsl/src/buildinfo.cpp
<command-line>: error: too many decimal points in number
D:/a/liblsl/liblsl/src/buildinfo.cpp:6:9: note: in expansion of macro 'LSL_LIBRARY_INFO_STR'
  return LSL_LIBRARY_INFO_STR;
         ^~~~~~~~~~~~~~~~~~~~
D:/a/liblsl/liblsl/src/buildinfo.cpp: In function 'const char* lsl_library_info()':
<command-line>: error: found ':' in nested-name-specifier, expected '::'
D:/a/liblsl/liblsl/src/buildinfo.cpp:6:9: note: in expansion of macro 'LSL_LIBRARY_INFO_STR'
  return LSL_LIBRARY_INFO_STR;
         ^~~~~~~~~~~~~~~~~~~~

Neither MSVC on Windows nor g++ on Linux have a problem with -DLSL_LIBRARY_INFO_STR=\"git:c744dc95c47d60a9d8356ad4fc31f0126f6f44e6/branch:refs/heads/githubmingw/build:Release/compiler:GNU-8.1.0/link:SHARED\". CMake 3.21 might have a fix for it, so I'll wait for that and try again.

tobiasherzke commented 3 years ago

Interesting! Can you point me to the build file and an github actions log, I would be interested in finding the cause.

tstenner commented 3 years ago

Sure, the log is here and the commit here. Just in case you can't access the build log, I've also uploaded it: buildlog.txt.gz

I think the main issue is that the build steps are running in the bash shell, not the msys shell, but I'd rather avoid changing the shell for every step.

tobiasherzke commented 3 years ago

Thanks for the pointers. I still do not understand how this double-decimal-point error occurs.

But I saw that your build was using an older compiler installed by chocolatey and the bash shell from the git installation instead of from msys2.

I agree that it is undesirable to extend every step with a shell setting. Alternative suggestion here: https://github.com/tobiasherzke/liblsl/commit/13dac557fc4dcc77b75ebbbfc670714b84a3ec47 the shell is again specified only once in the file, but in a different place to allow a different shell for MinGW.

This change builds the liblsl DLL. The unit test stage still fails, but it is one step further.

tstenner commented 3 years ago

I still do not understand how this double-decimal-point error occurs.

Without the quotes, the compiler tries to make sense of git:c744dc95c47d60a9d8356ad4fc31f0126f6f44e6/branch:refs/heads/githubmingw/build:Release/compiler:GNU-8.1.0 and for some reason the 8.1.0 is the first thing it trips over.

I agree that it is undesirable to extend every step with a shell setting. Alternative suggestion here: tobiasherzke@13dac55 the shell is again specified only once in the file, but in a different place to allow a different shell for MinGW.

Nice, I didn't know you could put the defaults: in the build: section. The shell: parameter could be even shorter: shell: ${{ matrix.config.name == 'windows-mingw' && 'msys2 {0}' || 'bash' }}

This change builds the liblsl DLL. The unit test stage still fails, but it is one step further.

A segfault shouldn't happen, especially when testing the exported methods. The segfault seems to happen in the multithreaded lsl_last_error test, so maybe the warning about the ignored thread attribute should've been an error instead. I don't have a MinGW setup at hand right now, so it'd be great if you could take a look at this.

tobiasherzke commented 3 years ago

Reproducing on a local windows installation:

In order to reproduce the steps from the build file, I need

When executing the steps from the Makefile locally, I can reproduce the segmentation fault.

When reproducing the segmentation fault in the debugger, it does not give me any information in the backtrace.

In order to recompile with debug info, I have changed "Release" to "Debug" in the two cmake invocations that had "Release".

After doing this, there is no directory "install", but I can find the two test executables in build/testing.

The internal tests succeed as before.

The exported tests fail, but without producing the segmentation fault. The output is:

user@DESKTOP MINGW64 ~/liblsl/build
$ ./testing/lsl_test_exported.exe --order rand --wait-for-keypress never --durations yes
2021-06-27 16:53:59.845 (   0.003s) [        A9E48EB7]         api_config.cpp:231   INFO| Loaded default config
2021-06-27 16:53:59.846 (   0.003s) [        A9E48EB7]             common.cpp:64    INFO| git:lslgitrevision/branch:lslgitbranch/build:Debug/compiler:GNU-10.3.0/link:SHARED
1.064 s: datatransfer - int32_t
1.625 s: resolve multiple streams
1.055 s: datatransfer - char
2021-06-27 16:54:04.106 (   4.263s) [IO_movetest     ]   inlet_connection.cpp:45    WARN| The stream named 'movetest' can't be recovered automatically if its provider crashes because it doesn't have a unique source ID
3.074 s: move C++ API types
0.543 s: fullinfo
0.001 s: multithreaded lsl_last_error

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lsl_test_exported.exe is a Catch v2.13.6 host application.
Run with -? for options

-------------------------------------------------------------------------------
lsl_last_error size
-------------------------------------------------------------------------------
C:/msys64/home/user/liblsl/testing/test_ext_streaminfo.cpp:22
...............................................................................

C:/msys64/home/user/liblsl/testing/test_ext_streaminfo.cpp:25: FAILED:
  REQUIRE( lsl_last_error()[512] == 0 )
with expansion:
  6 == 0

0.000 s: lsl_last_error size
1.054 s: TypeConversion
1.063 s: datatransfer - float
1.614 s: timeouts
2021-06-27 16:54:11.453 (  11.611s) [IO_timesync     ]   inlet_connection.cpp:45    WARN| The stream named 'timesync' can't be recovered automatically if its provider crashes because it doesn't have a unique source ID
1.704 s: simple timesync
3.243 s: Flush
0.002 s: Invalid queries are caught before sending the query
0.037 s: resolve from streaminfo
1.079 s: datatransfer - int64_t
1.070 s: datatransfer - double
1.069 s: datatransfer - int16_t
1.107 s: data datatransfer
0.000 s: streaminfo
===============================================================================
test cases:  19 |  18 passed | 1 failed
assertions: 722 | 721 passed | 1 failed

but the diagnostic output is not stable: in the next run, it is

C:/msys64/home/user/liblsl/testing/test_ext_streaminfo.cpp:25: FAILED:
  REQUIRE( lsl_last_error()[512] == 0 )
with expansion:
  'o' == 0

and in the next

C:/msys64/home/user/liblsl/testing/test_ext_streaminfo.cpp:25: FAILED:
  REQUIRE( lsl_last_error()[512] == 0 )
with expansion:
  '▒' == 0

Hope this helps and you can make sense of it. My first guess when seeing this would be usage of uninitialized memory. (In that case, running the same test on Linux with valgrind would reliably diagnose it as such. It could also explain a segfault.)

tobiasherzke commented 3 years ago

Valgrind does not report uninitialized memory on Linux (see https://github.com/tobiasherzke/liblsl/commit/af99b67582149c430ca696fe452e9194be80aabf#diff-71cc08be7d433f3916799b252b7ae38be718a330946ec1399e8361276cd6bb40) which makes this explanation unlikely. Something else indeterministic must be going on.

tstenner commented 3 years ago

Hope this helps and you can make sense of it.

As always I appreciate your thoroughness, and while typing out why this shouldn't happen and MinGW is to blame, I found the error but it still doesn't fix the issue.

The last_error variable is thread_local (see the PR https://github.com/sccn/liblsl/pull/75), so each thread's last_error has to be zero-initialized at start of each thread (common.cpp:19: thread_local char last_error[512] = {0};). It's only written in two places: lsl_c_api_helpers.hpp:10: strncpy(last_error, e.what(), sizeof(last_error) - 1);, and in both places with strncpy so the last element should always be zero.

That being said, it's time to quote Phil Karlton:

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

The last element of char last_error[512] is, of course last_error[511].

After fixing it, it still segfaults. Maybe MinGW really is to blame?

tobiasherzke commented 3 years ago

Maybe MinGW really is to blame?

Initially, I found this hard to believe, because it's usually not the compiler's fault. But now, after extensive recompilations, printfs and retryings, I think you are right. The current MinGW-W64 compiler 10.3 (as packaged by msys2) somehow messes up the thread-local storage when that TLS is defined in a DLL, and optimization is -O3.

I'll try to update this pull request with a working modification of cppcmake.yml. Because I do not know cmake very well, I'll simply use --target=Debug for MinGW to disable -O3 in the proposed update. The additional -g does no harm IMO.

What my proposed change does not fix is the thread-local-storage in the loguru library that you use by source inclusion. I believe you will be able to fix that by replacing declspec(thread) with thread_local, but I have not tested this on all platforms.

tobiasherzke commented 3 years ago

What my proposed change does not fix is the thread-local-storage in the loguru library

0bd89ad demonstrates this problem by adding a test which exposes it.

tobiasherzke commented 3 years ago

e0fce48 suggests a fix.

tstenner commented 3 years ago

Maybe MinGW really is to blame?

Initially, I found this hard to believe, because it's usually not the compiler's fault.

Normally I'd agree, but when the compiler has an internal error for code MSVC, GCC and Clang accept without any warnings it's reason to get suspicious.

The current MinGW-W64 compiler 10.3 (as packaged by msys2) somehow messes up the thread-local storage when that TLS is defined in a DLL, and optimization is -O3. I'll try to update this pull request with a working modification of cppcmake.yml. Because I do not know cmake very well, I'll simply use --target=Debug for MinGW to disable -O3 in the proposed update. The additional -g does no harm IMO.

Debug builds will be far slower, so before disabling optimizations completely we should test if -Og, -O1 or -O2 will also work. This is best done by overriding CMAKE_CXX_FLAGS_RELEASE (as done here: https://github.com/tstenner/liblsl/commit/1041509e10572ec97d6d05810d2e5ba1009bf495).

Building against all optimization levels -O2, -O1, and -Og also crash, so the faulty optimization pass is one of

-fauto-inc-dec -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fforward-propagate -fguess-branch-probability
           -fipa-profile -fipa-pure-const -fipa-reference -fipa-reference-addressable -fmerge-constants  -fomit-frame-pointer -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop 
           -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-ter -funit-at-a-time

e0fce48 suggests a fix.

Looks good to me. We (and for authorship credits, I mean you :smile:) should submit a PR to the upstream repository (https://github.com/emilk/loguru). thread_local is plain C++11 so even on Windows it should be supported by all major compilers.

tstenner commented 3 years ago

And it's fixed: https://github.com/tstenner/liblsl/commit/a1f8679c95f0b4a0e297619b5d0e9b61a69ba349

tstenner commented 3 years ago

I have pushed the changes that allow MinGW to build liblsl even with optimizations. There's one issue: as liblsl is compiler-agnostic, the packages are named after the platform and architecture and the windows package names are already taken by the MSVC packages. I can think of two options:

  1. build with MinGW, but skip uploading packages so they have to be downloaded manually from the Github Actions page
  2. merge all changes except the entry in matrix.config. That way you can just carry a single (smaller) commit that removes some targets and adds the MinGW target.
tobiasherzke commented 3 years ago

I have pushed the changes that allow MinGW to build liblsl even with optimizations.

Thank you!

There's one issue: as liblsl is compiler-agnostic, the packages are named after the platform and architecture and the windows package names are already taken by the MSVC packages. I can think of two options:

1. build with MinGW, but skip uploading packages so they have to be downloaded manually from the Github Actions page

2. merge all changes except the entry in `matrix.config`. That way you can just carry a single (smaller) commit that removes some targets and adds the MinGW target.

You need to decide this, obviously this is not my decision. I would prefer 1) because then github actions will inform you if any changes that you make to liblsl causes problems with MinGW. This can be a liability because, as we have seen here, the MinGW compiler has some peculiarities. But it is also a chance for you to detect more possible problems in the code because one more compiler will mean more/different code analysis during compilation. The number of places in the code that were causing problems were very few, after all. And if in future the liability aspect causes too much work, you can always disable the MinGW build then. Especially if you do not publish it with your releases, then you will also not grow any significant user base that would depend on it.