naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
86 stars 20 forks source link

about 64bit and 32bit #16

Closed zif520 closed 8 years ago

zif520 commented 8 years ago

hi @naibaf7 there are some problems in android ,so i want to comple it in ubuntu 64bit,when i change int_tp to int ,there are some problems in greentea_math_functions.cpp, and ViennaCL, if ubuntu 64 is not support 32,will ubuntu 32 bit gcc will fix it?

CXX src/caffe/greentea/greentea_math_functions.cpp In file included from ./include/caffe/greentea/greentea.hpp:38:0, from ./include/caffe/common.hpp:25, from src/caffe/greentea/greentea_math_functions.cpp:8: ../ViennaCL/viennacl/vector.hpp: In instantiation of ‘viennacl::vector_base<NumericT, SizeT, DistanceT>& viennacl::vectorbase<SCALARTYPE, SizeType, DistanceType>::operator=(float) [with NumericT = float; SizeT = unsigned int; DistanceT = int]’: src/caffe/greentea/greentea_math_functions.cpp:579:8: required from ‘void caffe::greentea_gpu_scal(int32_t, int32_t, Dtype, cl_mem, int32_t) [with Dtype = float; int32_t = int; cl_mem = _clmem]’ src/caffe/greentea/greentea_math_functions.cpp:597:57: required from here ../ViennaCL/viennacl/vector.hpp:669:63: error: no matching function for call to ‘av(viennacl::vector_base<float, unsigned int, int>&, viennacl::vector_base<float, unsigned int, int>&, float, int, bool, bool)’ *this, NumericT(val), 1, false, false); ^ ../ViennaCL/viennacl/vector.hpp:669:63: note: candidate is: In file included from ../ViennaCL/viennacl/vector.hpp:33:0, from ./include/caffe/greentea/greentea.hpp:38, from ./include/caffe/common.hpp:25, from src/caffe/greentea/greentea_math_functions.cpp:8: ../ViennaCL/viennacl/linalg/vector_operations.hpp:78:10: note: template<class T, class ScalarType1> void viennacl::linalg::av(viennacl::vector_base&, const viennacl::vector_base&, const ScalarType1&, viennacl::vcl_size_t, bool, bool) void av(vector_base & vec1

naibaf7 commented 8 years ago

@zif520 Thanks yeah, I'm working on it :)

zif520 commented 8 years ago

hi @naibaf7 I found a problem,i use 32bit and compler is 32bit (armeabi-v7a with NEON),i had setted Definitions.hpp and Header.cl define int_tp int32_t,

when i use viennacl::ocl::enqueue(oclk_fill()) "fillbuffer" function ,the GPU only accept 64bit ,it will ok when it use "ViennaCL: Setting long precision kernel argument",when it use "ViennaCL: Setting int precision kernel argument",it wil faild. so i convert 32bit to 64bit Forcibly.such as "oclk_fill(static_cast< int64_t >(N),)"

i had tried a lot of ways to change GPU to 32bit ,but faild,do you konw how to difine 32bit in GPU ?

naibaf7 commented 8 years ago

@zif520 I am working on an update that will fix this, it will be available shortly. I'll announce it here.

naibaf7 commented 8 years ago

@zif520 OK, 32 bit should work. Let me know if there are more issues. You should not need to re-configure anything, it should work out of the box.

zif520 commented 8 years ago

@naibaf7 :+1: it is great! I can run it on android by 64bit arg,i will change it to 32bit refer to your code, thanks a lot! and happy new year! : )

naibaf7 commented 8 years ago

@zif520 In a new update I also reduced queue level concurrency from 8 to 1 because it does not help much on desktop devices but may prevent networks with a batch size > 1 from running on mobile devices. Check out the latest update.

Happy new year ;)

zif520 commented 8 years ago

there are some bug in im2col.cu with cuda,the function of min() is not matched const int_tp w_col_end= min(w_im / stride_w + 1<width_col); i replace it by const int_tp w_col_end=( w_im / stride_w + 1)<width_col ? ( w_im / stride_w + 1):width_col;

naibaf7 commented 8 years ago

really? Which version of CUDA are you using? Seems to work fine for me.

zif520 commented 8 years ago

cuda7.0 Ubuntu14.04 min(long,int64_t),cant matched

list of min(): min(int,int) min(unsigned int,unsigned int) min(long long,long long) and so on.

naibaf7 commented 8 years ago

OK, try CUDA 7.5

zif520 commented 8 years ago

only the function of min(),now it work ok!