Open Aetf opened 3 years ago
@Aetf Thanks for your interest. Maybe try CentOS?
Is CUDA 8.0 and cuDNN 5 the correct version I should use?
I'm still getting the same error on CentOS7 (specifically inside the nvidia/cuda:8.0-cudnn5-devel-centos7
docker).
I changed cavs/util/mpi_types.h
otherwise there's error about ‘constexpr’ needed for in-class initialization of static data member ‘ompi_datatype_t* const DataTypeToMPIType<float>::value’ of non-integral type [-fpermissive]
diff --git a/cavs/util/mpi_types.h b/cavs/util/mpi_types.h
index 71f8786..c39c6fa 100644
--- a/cavs/util/mpi_types.h
+++ b/cavs/util/mpi_types.h
@@ -11,7 +11,7 @@ struct DataTypeToMPIType {
#define MATCH_TYPE_TO_MPI_TYPE(TYPE, ENUM) \
template <> \
struct DataTypeToMPIType<TYPE> { \
- static const MPI_Datatype value = ENUM; \
+ constexpr static const MPI_Datatype value = ENUM; \
}
MATCH_TYPE_TO_MPI_TYPE(float, MPI_FLOAT);
After that, the c++ files build fine, but cuda files fail with the following error:
[ 47%] Building NVCC (Device) object cavs/CMakeFiles/cavs_cuda.dir/backend/cavs_cuda_generated_op_impl_variable.cu.o
/root/.conan/data/protobuf/3.9.1/_/_/package/e5ac722d270cf7c45ba6c1301f2e878770b1eea3/include/google/protobuf/generated_message_table_driven.h(210): error: static assertion failed with ""
/root/.conan/data/gflags/2.2.2/_/_/package/eba3a7291a32f6bd003594aa6a9cdd2641a3dac2/include/gflags/gflags.h(226): warning: attribute "visibility" does not apply here
/workspaces/cavs/Cavs/cavs/util/mpi_types.h(17): error: expression must have a constant value
/workspaces/cavs/Cavs/cavs/util/mpi_types.h(18): error: expression must have a constant value
/root/.conan/data/boost/1.75.0/_/_/package/a0d4506c66082ed792ced118b38c1e3c29fc5335/include/boost/core/noncopyable.hpp(42): error: defaulted default constructor cannot be constexpr because the corresponding implicitly declared default constructor would not be constexpr
/root/.conan/data/boost/1.75.0/_/_/package/a0d4506c66082ed792ced118b38c1e3c29fc5335/include/boost/random/linear_congruential.hpp(138): warning: pointless comparison of unsigned integer with zero
detected during instantiation of "void boost::random::linear_congruential_engine<IntType, a, c, m>::seed(const IntType &) [with IntType=uint64_t, a=25214903917UL, c=11UL, m=281474976710656UL]"
(391): here
/root/.conan/data/boost/1.75.0/_/_/package/a0d4506c66082ed792ced118b38c1e3c29fc5335/include/boost/random/linear_congruential.hpp(145): warning: pointless comparison of unsigned integer with zero
detected during instantiation of "void boost::random::linear_congruential_engine<IntType, a, c, m>::seed(const IntType &) [with IntType=uint64_t, a=25214903917UL, c=11UL, m=281474976710656UL]"
(391): here
4 errors detected in the compilation of "/tmp/tmpxft_00007cf0_00000000-9_op_impl_variable.compute_52.cpp1.ii".
CMake Error at cavs_cuda_generated_op_impl_variable.cu.o.Debug.cmake:276 (message):
Error generating file
/workspaces/cavs/build/cavs/CMakeFiles/cavs_cuda.dir/backend/./cavs_cuda_generated_op_impl_variable.cu.o
make[2]: *** [cavs/CMakeFiles/cavs_cuda.dir/backend/cavs_cuda_generated_op_impl_variable.cu.o] Error 1
As you can see, multiple errors going on here:
Btw, boost is a required dependency but not listed in CMakeLists.txt
.
I met this constexpr
before. I guess it is an MPI version issue. You should avoid using the boost-MPI or the system built-in MPI headers. Try mpich or Intel MPI. Once you fix the MPI issue everything CUDA should build fine.
For the protobuf issue, I never saw it before..
Hmm, I'm using OpenMPI 4.1 compiled from the source. Let me try the other ones.
It seems to me the CUDA compiler is complaining about some modern C++ constructs, like those used in boost or protobuf. Anyway, boost is needed because of this file, which gets pulled in by cavs/backend/functor_filler.cuh
.
What version of boost & protobuf do you use?
I'll need to check my old CMU cluster later -- maybe get you back tomorrow.
Thanks, that'd be really helpful!
I find it easier to port to the latest c++ and cudnn than figuring out this...
oops @Aetf . That's wonderful... Would you mind submitting a PR to master if things work well?
BTW, I am not sure how performance will change if you switch to latest cudnn. I guess you will see some performance boost on both Cavs and CUDNN baselines.
@zhisbug Sure. It's mostly hacky and dirty hacks to get things to compile especially regarding dependency handling. I can prepare a PR once I get time to clean that up.
I'm having trouble building from the source. What are the version requirements for the project? My build on Ubuntu 16.04 with the following
fails with multiple errors related to protobuf:
generated_message_table_driven.h(210): error: static assertion failed with ""
, as well as several otherconstexpr
related issues, when those files are included from*.cu
files and built by the nvcc.