multiscale / muscle3

The third major version of the MUltiScale Coupling Library and Environment
Apache License 2.0
25 stars 13 forks source link

Build issue on MARCONI #38

Closed olivhoenen closed 4 years ago

olivhoenen commented 4 years ago

Using GCC/6.1.0, openmpi/1.10.3, python/3.6.4, following install steps from https://muscle3.readthedocs.io/en/latest/installing.html:

Checking for protobuf >= 3.7.1...

Checking for grpc >= 1.17.1...

Checking for msgpack >= 3.1.0...

Checking for googletest >= 1.8.1...

Building local protobuf... make -C protobuf make[3]: Entering directory /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/protobuf' make[3]: Nothing to be done forall'. make[3]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/protobuf'

Building local grpc... make -C grpc

Checking for c-ares >= 1.11.0...

Checking for zlib >= 1.2...

Checking for openssl >= 1.0.2...

Building local c-ares... make -C c-ares make[4]: Entering directory /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/c-ares' make[4]: Nothing to be done forall'. make[4]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/c-ares' Not building zlib, it was already available. Not building openssl, it was already available.

make[3]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc'

Building muscle manager protocol... make -j 2 -C muscle_manager_protocol make[3]: Entering directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/muscle_manager_protocol' g++ -std=c++14 -I/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/muscle_manager_protocol/../../src -fPIC -pthread -I/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/protobuf/protobuf/include -I/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/c-ares/c-ares/include -I/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc/include -c -o muscle_manager_protocol.grpc.pb.o /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/muscle_manager_protocol/../../src/muscle_manager_protocol/muscle_manager_protocol.grpc.pb.cc In file included from /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/muscle_manager_protocol/../../src/muscle_manager_protocol/muscle_manager_protocol.grpc.pb.cc:6:0: /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/muscle_manager_protocol/../../src/muscle_manager_protocol/muscle_manager_protocol.grpc.pb.h:10:55: fatal error: grpcpp/impl/codegen/async_generic_service.h: No such file or directory

include <grpcpp/impl/codegen/async_generic_service.h>

                                                   ^

compilation terminated. make[3]: [muscle_manager_protocol.grpc.pb.o] Error 1 make[3]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/muscle_manager_protocol' make[2]: [muscle_manager_protocol] Error 2 make[2]: Leaving directory /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build' make[1]: *** [all] Error 2 make[1]: Leaving directory/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp' make: *** [cpp] Error 2

LourensVeen commented 4 years ago

Oh hey that's weird. That looks like a bug in the build system. It needs to build gRPC, so it starts doing so, checks gRPCs dependencies, finds zlib and OpenSSL but not c-ares, builds c-ares but not zlib or OpenSSL (so far so good), and then completely skips building gRPC. Or did you cut that out of the log?

LourensVeen commented 4 years ago

I think what we're looking at here is a partial build, right? Could you do a fresh one from scratch, save the output to a file, and attach that here (or mail it to me)? Something like make >make.log 2>&1 should do the trick. The above doesn't give me enough information to see what's going on I'm afraid.

olivhoenen commented 4 years ago

Here is full log from fresh start: make.log

There seem to be issues in gRPC step indeed, in install step.

LourensVeen commented 4 years ago

That's weird. It's supposed to be installing gRPC to a staging directory inside of the build dir here.

That error seems to be because it's running ldconfig, which is what you'd do if you install something to a system directory and want the dynamic linker to update its cache so that it will be able to find the newly-installed library when you start an application linked against it. You need root rights to update the cache, so that's what it's complaining about.

But it doesn't make sense to run ldconfig if you're not installing into a system directory, and there's the make[4]: execvp: /bin/sh: Argument list too long part as well, which suggests that there may be something else going on that may be causing this. I need to dive into the gRPC build system a bit to figure this one out. I have a deadline tomorrow, so I have to drop this for a moment, but I'll get back tomorrow and see if I can figure this out.

One thing you could try in the mean time is a newer version of gRPC. If you do make distclean to make sure we're not getting things confused, then edit libmuscle/cpp/build/grpc/Makefile and change the line dep_version := 1.17.1 to dep_version := 1.28.1 then it will auto-install the latest version instead. That's a total guess though at this point.

olivhoenen commented 4 years ago

gRPC 1.28.1 is not working either, but for different reasons:

Building grpc...
cd grpc-1.28.1 && export prefix=/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc && export PKG_CONFIG_PATH=/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/protobuf/protobuf/lib/pkgconfig:/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/c-ares/c-ares/lib/pkgconfig:/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/openssl/openssl/lib/pkgconfig:/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/zlib/zlib/lib/pkgconfig:/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc/lib/pkgconfig:/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/msgpack/msgpack/lib/pkgconfig && export PATH=/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/protobuf/protobuf/bin:/cineca/prod/opt/libraries/zlib/1.2.8/gnu--6.1.0/bin:/cineca/prod/opt/tools/cmake/3.12.0/none/bin:/cineca/prod/opt/compilers/openmpi/1-10.3/gnu--6.1.0/bin:/cineca/prod/opt/compilers/gnu/6.1.0/none/bin:/cineca/prod/opt/compilers/python/3.6.4/none/bin:/cineca/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/marconi/home/userexternal/ohoenen0/.local/bin:/marconi/home/userexternal/ohoenen0/bin && export LDFLAGS=-L/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/protobuf/protobuf/lib && export LD_LIBRARY_PATH=/cineca/prod/opt/libraries/zlib/1.2.8/gnu--6.1.0/lib:/cineca/prod/opt/compilers/openmpi/1-10.3/gnu--6.1.0/lib:/cineca/prod/opt/compilers/gnu/6.1.0/none/lib64:/cineca/prod/opt/compilers/python/3.6.4/none/lib:/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/protobuf/protobuf/lib:/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/c-ares/c-ares/lib:/usr/lib:/usr/lib && export CXXFLAGS='-Wno-error' && make -j 2 && make install
make[4]: Entering directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc-1.28.1'
[MAKE]    Generating /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc-1.28.1/libs/opt/pkgconfig/grpc.pc
[MAKE]    Generating /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc-1.28.1/libs/opt/pkgconfig/gpr.pc
[MAKE]    Generating /marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc-1.28.1/libs/opt/pkgconfig/grpc_unsecure.pc
[MAKE]    Generating cache.mk
[C]       Compiling third_party/address_sorting/address_sorting.c
[C]       Compiling third_party/address_sorting/address_sorting_posix.c
third_party/address_sorting/address_sorting_posix.c: In function 'posix_source_addr_factory_get_source_addr':
third_party/address_sorting/address_sorting_posix.c:54:42: warning: unused parameter 'factory' [-Wunused-parameter]
     address_sorting_source_addr_factory* factory,
                                          ^~~~~~~
[C]       Compiling third_party/address_sorting/address_sorting_windows.c
[C]       Compiling third_party/upb/upb/decode.c
[C]       Compiling third_party/upb/upb/encode.c
third_party/upb/upb/decode.c: In function 'upb_skip_unknowngroup':
third_party/upb/upb/decode.c:164:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   CHK(d->end_group == field_number);
                    ^
third_party/upb/upb/decode.c:48:22: note: in definition of macro 'CHK'
 #define CHK(x) if (!(x)) { return 0; }
                      ^
third_party/upb/upb/decode.c: In function 'upb_decode_groupfield':
third_party/upb/upb/decode.c:325:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   CHK(d->end_group == field_number);
                    ^
third_party/upb/upb/decode.c:48:22: note: in definition of macro 'CHK'
 #define CHK(x) if (!(x)) { return 0; }
                      ^
[C]       Compiling third_party/upb/upb/msg.c
[C]       Compiling third_party/upb/upb/port.c
[C]       Compiling third_party/upb/upb/table.c
[C]       Compiling third_party/upb/upb/upb.c
make[4]: *** No rule to make target `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc-1.28.1/objs/opt/third_party/abseil-cpp/absl/base/dynamic_annotations.o', needed by `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc-1.28.1/libs/opt/libgrpc_abseil.a'.  Stop.
make[4]: *** Waiting for unfinished jobs....
make[4]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc/grpc-1.28.1'
make[3]: *** [grpc] Error 2
make[3]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build/grpc'
make[2]: *** [grpc] Error 2
make[2]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp/build'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/marconi/home/userexternal/ohoenen0/muscle3_source/muscle3-0.3.0/libmuscle/cpp'
make: *** [cpp] Error 2
olivhoenen commented 4 years ago

Of possible interest: https://github.com/grpc/grpc/issues/14844 Applying (somehow, I'm still a bit puzzled when trying to resume a build so I had to hack libmuscle/cpp/build/grpc/Makefile to make sure I can have this patch while starting from fresh) the last patch found in the issue above (marked as stale, so not sure later versions will have the necessary changes to avoid this issue) allow me to pass this install step successfully.

olivhoenen commented 4 years ago

Next issue linked to MPI build: make4.log

Will try without this support tomorrow (as I'm not sure what to expect exactly from it in MUSCLE3 at this stage).

LourensVeen commented 4 years ago

Getting gRPC to compile seems to be a non-trivial undertaking, it seems to have many issues (also see https://github.com/multiscale/muscle3/issues/39). I've been working through a bunch of them with Hamid. I'm going to set up a better testing system to try to get these before they hit the users in the future (https://github.com/multiscale/muscle3/issues/41).

I think the cause is probably that Google have their own internal company-wide build system (Blaze), so as long as that works, the maintainers won't notice immediately when the other builds are broken.

Resuming builds is admittedly a bit messy, especially when you have external dependencies in the mix. In this case, doing a make clean in libmuscle/cpp/build/grpc/grpc-1.x should leave the patch but clean up the partially failed build, and then if you do a make from the top directory it should try to build gRPC again. Except if it got to the installation stage previously, then you have to remove libmuscle/cpp/build/grpc/grpc, otherwise it will think that it's already done.

The main thing MPI support does for you currently is that it implements a non-spinloop barrier when receiving data. If you have an MPI submodel then I think you should be able to link it to the non-MPI version of libmuscle, but you'd have to use an MPI barrier to keep your processes waiting for a message to arrive, which eats up CPU cores that you may want to share between a macro and a micro model for instance. libmuscle-MPI will also let you get model settings in any MPI process, which is convenient because it saves you from broadcasting them. The way you create the Instance object is slightly different in the MPI version; I think that's the only API difference though, and it's minor.

LourensVeen commented 4 years ago

This issue 14844 seems to be the culprit indeed. It would be good to try to get that patch into the gRPC upstream, but I have to get a CLA signed first to be able to submit stuff.

LourensVeen commented 4 years ago

The MPI problem is finally a problem on my side: the libmuscle/cpp/build/libmuscle/libmuscle_mpi.version file is missing because of a mistake in the .gitignore file, so it was never checked in. The CI doesn't build with MPI enabled, so it was overlooked.

The file is here: libmuscle_mpi.zip (zipped, because GitHub gets confused by the extension).

olivhoenen commented 4 years ago

Ok, continuing with this missing file, last issue is the name of the Fortran compiler for MPI code, in libmuscle/fortran/build/libmuscle/Makefile, set as mpi$(FC) which gives mpigfortran in this case. Replacing this one with mpif90 allows the compilation to finish properly (maybe setting $(MPIFC) there would be best?)

LourensVeen commented 4 years ago

I have no idea really if there is any standard for an environment variable or standard name. I've seen mpif77 and mpiifort. It's probably one of those things that's different everywhere, but I'll check the MPI spec to see if it says anything. Letting the user override is probably a good idea anway.

LourensVeen commented 4 years ago

Ah, it looks like the initial gRPC build issue may have been due to #40. So that should be fixed in develop.

LourensVeen commented 4 years ago

The build system now checks mpi$(FC), and falls back to mpifort, mpif90 and mpif77, which is what seems to be available with OpenMPI on my system. If that doesn't work, you can set MPIFC yourself, and the make output and the documentation now make clear that that's possible and how to do it.

LourensVeen commented 4 years ago

I've reproduced the long-pathname issue locally, and I've added a patch for 14844 to the libmuscle build system for now that fixes that and allows building in a directory with a long path name.

LourensVeen commented 4 years ago

Fix released with 0.3.1, and confirmed to work on MARCONI.