zhisbug / Cavs

Cavs: An Efficient Runtime System for Dynamic Neural Networks
https://github.com/zhisbug/Cavs
Apache License 2.0
13 stars 2 forks source link

Version of the dependencies? #3

Open Aetf opened 3 years ago

Aetf commented 3 years ago

I'm having trouble building from the source. What are the version requirements for the project? My build on Ubuntu 16.04 with the following

fails with multiple errors related to protobuf: generated_message_table_driven.h(210): error: static assertion failed with "", as well as several other constexpr related issues, when those files are included from *.cu files and built by the nvcc.

zhisbug commented 3 years ago

@Aetf Thanks for your interest. Maybe try CentOS?

Aetf commented 3 years ago

Is CUDA 8.0 and cuDNN 5 the correct version I should use?

zhisbug commented 3 years ago

Yes; see here: https://github.com/zhisbug/Cavs/blob/master/CMakeLists.txt#L20 https://github.com/zhisbug/Cavs/blob/master/CMakeLists.txt#L50

Aetf commented 3 years ago

I'm still getting the same error on CentOS7 (specifically inside the nvidia/cuda:8.0-cudnn5-devel-centos7 docker).

I changed cavs/util/mpi_types.h otherwise there's error about ‘constexpr’ needed for in-class initialization of static data member ‘ompi_datatype_t* const DataTypeToMPIType<float>::value’ of non-integral type [-fpermissive]

diff --git a/cavs/util/mpi_types.h b/cavs/util/mpi_types.h
index 71f8786..c39c6fa 100644
--- a/cavs/util/mpi_types.h
+++ b/cavs/util/mpi_types.h
@@ -11,7 +11,7 @@ struct DataTypeToMPIType {
 #define MATCH_TYPE_TO_MPI_TYPE(TYPE, ENUM)  \
   template <>                               \
   struct DataTypeToMPIType<TYPE> {          \
-    static const MPI_Datatype value = ENUM; \
+    constexpr static const MPI_Datatype value = ENUM; \
   }

 MATCH_TYPE_TO_MPI_TYPE(float,  MPI_FLOAT);

After that, the c++ files build fine, but cuda files fail with the following error:

[ 47%] Building NVCC (Device) object cavs/CMakeFiles/cavs_cuda.dir/backend/cavs_cuda_generated_op_impl_variable.cu.o
/root/.conan/data/protobuf/3.9.1/_/_/package/e5ac722d270cf7c45ba6c1301f2e878770b1eea3/include/google/protobuf/generated_message_table_driven.h(210): error: static assertion failed with ""

/root/.conan/data/gflags/2.2.2/_/_/package/eba3a7291a32f6bd003594aa6a9cdd2641a3dac2/include/gflags/gflags.h(226): warning: attribute "visibility" does not apply here

/workspaces/cavs/Cavs/cavs/util/mpi_types.h(17): error: expression must have a constant value

/workspaces/cavs/Cavs/cavs/util/mpi_types.h(18): error: expression must have a constant value

/root/.conan/data/boost/1.75.0/_/_/package/a0d4506c66082ed792ced118b38c1e3c29fc5335/include/boost/core/noncopyable.hpp(42): error: defaulted default constructor cannot be constexpr because the corresponding implicitly declared default constructor would not be constexpr

/root/.conan/data/boost/1.75.0/_/_/package/a0d4506c66082ed792ced118b38c1e3c29fc5335/include/boost/random/linear_congruential.hpp(138): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "void boost::random::linear_congruential_engine<IntType, a, c, m>::seed(const IntType &) [with IntType=uint64_t, a=25214903917UL, c=11UL, m=281474976710656UL]" 
(391): here

/root/.conan/data/boost/1.75.0/_/_/package/a0d4506c66082ed792ced118b38c1e3c29fc5335/include/boost/random/linear_congruential.hpp(145): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "void boost::random::linear_congruential_engine<IntType, a, c, m>::seed(const IntType &) [with IntType=uint64_t, a=25214903917UL, c=11UL, m=281474976710656UL]" 
(391): here

4 errors detected in the compilation of "/tmp/tmpxft_00007cf0_00000000-9_op_impl_variable.compute_52.cpp1.ii".
CMake Error at cavs_cuda_generated_op_impl_variable.cu.o.Debug.cmake:276 (message):
  Error generating file
  /workspaces/cavs/build/cavs/CMakeFiles/cavs_cuda.dir/backend/./cavs_cuda_generated_op_impl_variable.cu.o

make[2]: *** [cavs/CMakeFiles/cavs_cuda.dir/backend/cavs_cuda_generated_op_impl_variable.cu.o] Error 1
Aetf commented 3 years ago

As you can see, multiple errors going on here:

Btw, boost is a required dependency but not listed in CMakeLists.txt.

zhisbug commented 3 years ago

I met this constexpr before. I guess it is an MPI version issue. You should avoid using the boost-MPI or the system built-in MPI headers. Try mpich or Intel MPI. Once you fix the MPI issue everything CUDA should build fine.

For the protobuf issue, I never saw it before..

Aetf commented 3 years ago

Hmm, I'm using OpenMPI 4.1 compiled from the source. Let me try the other ones.

It seems to me the CUDA compiler is complaining about some modern C++ constructs, like those used in boost or protobuf. Anyway, boost is needed because of this file, which gets pulled in by cavs/backend/functor_filler.cuh.

What version of boost & protobuf do you use?

zhisbug commented 3 years ago

I'll need to check my old CMU cluster later -- maybe get you back tomorrow.

Aetf commented 3 years ago

Thanks, that'd be really helpful!

Aetf commented 3 years ago

I find it easier to port to the latest c++ and cudnn than figuring out this...

zhisbug commented 3 years ago

oops @Aetf . That's wonderful... Would you mind submitting a PR to master if things work well?

BTW, I am not sure how performance will change if you switch to latest cudnn. I guess you will see some performance boost on both Cavs and CUDNN baselines.

Aetf commented 3 years ago

@zhisbug Sure. It's mostly hacky and dirty hacks to get things to compile especially regarding dependency handling. I can prepare a PR once I get time to clean that up.