torrvision / crfasrnn

This repository contains the source code for the semantic image segmentation method described in the ICCV 2015 paper: Conditional Random Fields as Recurrent Neural Networks. http://crfasrnn.torr.vision/
Other
1.34k stars 460 forks source link

Compatability with cuda 8.0 + cuDNN 5.1 ? #125

Closed nathanin closed 7 years ago

nathanin commented 7 years ago

It's done. I've done it. Anyone who happens upon this please learn from my mistakes and read all the issues before posting your own. And just use git's own tools instead of ever merging a codebase by hand.

// old I've merged the necessary .cpp, .hpp and .cu files into my own caffe, that has other layers depending on cuDNN v5.1.

My errors are different than those in #97 . I did see that issue before, but the rest of the layers support cuDNN 5.1, and besides I can make all without problems.

I solved one redefinition error (below) by adding "_cpu" to the offending lines in modified_permutohedral.cpp and everything compiles. But I get a neverending stream of errors in make runtest... all to do with CUDA.

77 was one issue, but fixing it has exposed some more problems. One concerns a call to cudaMemset(). Another is in cudaFree():

The third common error is cudaSuccess (77 vs. 0) inside of hash_table.hpp (line 45)

Other tests, e.g. CuDNNConvolutionLayerTest run and pass no problem. It's always in the cuda-related files copied from this repo.

[----------] 1 test from MultiStageMeanfieldLayerTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN      ] MultiStageMeanfieldLayerTest/0.TestGradient
F0508 14:28:11.484486 12110 modified_permutohedral.hpp:60] Check failed: error == cudaSuccess (17 vs. 0)  invalid device pointer
*** Check failure stack trace: ***
    @     0x7f74ad9a05cd  google::LogMessage::Fail()
    @     0x7f74ad9a2433  google::LogMessage::SendToLog()
    @     0x7f74ad9a015b  google::LogMessage::Flush()
    @     0x7f74ad9a2e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f74ab8fe8c6  caffe::ModifiedPermutohedral::~ModifiedPermutohedral()
    @     0x7f74ab8fe972  boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7f74ab8fdada  boost::detail::sp_counted_base::release()
    @     0x7f74ab900e55  caffe::MultiStageMeanfieldLayer<>::Forward_cpu()
    @           0x47b912  caffe::Layer<>::Forward()
    @           0x4a3789  caffe::GradientChecker<>::CheckGradientSingle()
    @           0x4a4963  caffe::GradientChecker<>::CheckGradientExhaustive()
    @           0x5a43b8  caffe::MultiStageMeanfieldLayerTest_TestGradient_Test<>::TestBody()
    @           0x916ac3  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x9106aa  testing::Test::Run()
    @           0x9107f8  testing::TestInfo::Run()
    @           0x9108d5  testing::TestCase::Run()
    @           0x910b8f  testing::internal::UnitTestImpl::RunAllTests()
    @           0x910ec3  testing::UnitTest::Run()
    @           0x46effd  main
    @     0x7f74aab5f830  __libc_start_main
    @           0x476b19  _start
    @              (nil)  (unknown)
Makefile:526: recipe for target 'runtest' failed
make: *** [runtest] Aborted (core dumped)

I use cuda 8 and cuDNN 5.1 on ubuntu 16.04, but the errors are the same on OS X with

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Oct_30_22:18:43_CDT_2016
Cuda compilation tools, release 8.0, V8.0.54

$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR      5
#define CUDNN_MINOR      1
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

// Original post Hi, I am trying to transfer the layers from this repo into another modified caffe version. I run make all and it completes with some warnings. But I get the attached error when running make test.

By examining https://github.com/torrvision/caffe/blob/crfrnn/src/caffe/util/modified_permutohedral.cpp and https://github.com/torrvision/caffe/blob/crfrnn/include/caffe/util/modified_permutohedral.hpp I see that init and compute are defined in both. I am new to C++ and am not sure what to modify to make the test build.

My OS is OS X El Capitan, and I'm building with cuDNN v5 support ON. Any advice appreciated :) (Will also try on Ubuntu 16.04 machine at a later date)

...
CXX src/caffe/util/modified_permutohedral.cpp
src/caffe/util/modified_permutohedral.cpp:108:29: error: redefinition of 'init'
void ModifiedPermutohedral::init(const float* features, int num_dimensions, int num_points)
                            ^
./include/caffe/util/modified_permutohedral.hpp:89:8: note: previous definition is here
  void init (const float* features, int num_dimensions, int num_pixels){
       ^
src/caffe/util/modified_permutohedral.cpp:728:29: error: redefinition of 'compute'
void ModifiedPermutohedral::compute (float* out, const float* in, int value_size, bool reverse, bool add) const
                            ^
./include/caffe/util/modified_permutohedral.hpp:102:8: note: previous definition is here
  void compute(float* out, const float* in, int value_size, bool reverse = false, bool add = false) const{
       ^
src/caffe/util/modified_permutohedral.cpp:736:29: error: redefinition of 'compute'
void ModifiedPermutohedral::compute (double* out, const double* in, int value_size, bool reverse, bool add) const
                            ^
./include/caffe/util/modified_permutohedral.hpp:117:8: note: previous definition is here
  void compute(double* out, const double* in, int value_size, bool reverse = false, bool add = false) const{
       ^
3 errors generated.
make: *** [.build_release/src/caffe/util/modified_permutohedral.o] Error 1
yaoyz96 commented 5 years ago

I have also had such a problem, but still can't solve it. How do you solve it? Thanks :)