naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
85 stars 20 forks source link

When run in arm for plenty of pictures #40

Closed zazd closed 8 years ago

zazd commented 8 years ago

I use caffe-opencl in arm(odroid, mali-t628). When I run a picture one time, I can get the correct output. This means I run the project , get the correct output and close it down. But when I use plenty of pictures, the first picture can get the correct output, but the 2th, 3th..... picture, the outputs I get are the same as the first picture. The way I use this project is: float *input_data = input_layer->mutable_cpu_data(); and then let the pointer (input_data) point to the picture. After initializing the Net, run Net->Forward(). It is OK. But if I use for(each picture in pictures){ input_data point to picture; Net->Forward(); } I always get the first picture output no matter which picture is inputed.

Howerver, When I run this project in PC, everything is OK, I do not get this problem. It seems that something wrong in the arm. Or the setting of opencl have the different kind of performance between PC and arm.

ps: I print the data before doing the greentea_conv_im2col_gpu() in base_conv_later, the different pictures have different data, but after greentea_conv_im2col_gpu(), the data become identical.

zazd commented 8 years ago

@naibaf7 need your help, thank you !

naibaf7 commented 8 years ago

@zazd Ok, the issue here is that Caffe does not know that you switched the picture data. It only copies the data from CPU to GPU once. You need to make sure that the data gets copied again.

Can you show me the whole code so that I can propose a fix? One possible fix would be to do this in a loop:

for(each picture in pictures){
// Do THIS with EACH iteration, it invalidates the GPU data and lets Caffe know that CPU data is updated:
float *input_data = input_layer->mutable_cpu_data();
input_data point to picture;
Net->Forward();
}

But I think you're supposed to copy the pictures TO input_data, and not direct the pointer to the image.

zazd commented 8 years ago

OK. Wait a moment.

But everytime the Net->forwward() run , it call the function in Forward_gpu in conv_layer.cu. In that function, "const Dtype* bottom_data = bottom[i]->gpu_data();" have been run. So it will copy the data from cpu to gpu, right ?

naibaf7 commented 8 years ago

@zazd Only if Caffe knows that the GPU data is not up-to-date. Which it thinks it is. Only ->mutable_cpu_data() will invalidate the GPU buffer and lead to a copy the next time ->gpu_data() is used.

zazd commented 8 years ago

Sorry, it may cost my much time to copy the while for you. So, you can https://github.com/sh1r0/caffe/tree/2daa41445e8b445848422835abb737191060044b/android

copy caffe_mobile.cpp to caffe/tools copy caffe_mobile.hpp to include/caffe

do in caffe_mobile.cpp Caffe::set_mode(Caffe::GPU); Caffe::SetDevice(0);

net_.reset(new Net(model_path, caffe::TEST, Caffe::GetDefaultDevice()));

then make it.

then in caffe_mobile.cpp main():

for(picture_string in pictures_strung){ vector top_3 = caffe_mobile->PredictTopK(picture_string, number); }

You can test it. If this can not be useful for you, I will give you the simply version of my code tomorrow for I have to leave the office now.

Thank you for you help.

PS: maybe it is in to_gpu() in syncedmen.cpp

naibaf7 commented 8 years ago

Hm okay, the code actually looks right. Not sure if using OpenCV is a great idea, it can be a bit tricky at times, but yeah.

Is the first image correct no matter which image you test with? Do you use the CPU or GPU on OpenCL with the ARM?

The only issue I could think of now is that we are using unified memory between the CPU and GPU on the ARM and that something goes wrong there (because actually there is no separation between CPU and GPU memory if the memory can be used unified).

With what computer/desktop GPU did you test this code with?

zazd commented 8 years ago

The first image correct no matter which image I test, and the others are the same output as the first one. Would you tell me which arm device and GPU you use. I use Mali-T628. Maybe I find the reason. It actually does not copy the data from CPU to GPU except the first one. In the to_gpu(), in PC, everything is OK because the function ZEROCOPYSUPPORTED(device, cpuptr, size_) always return false, so the flag own_zero_copydata always will be false. So if (!own_zero_copydata) greentea_gpumemcpy(size, cpuptr, (cl_mem) gpuptr, 0, &ctx); will be call. Therefore, the data can always be copied from cpu to gpu. However, in arm(Mali-T628), the function ZEROCOPYSUPPORTED(device, cpuptr, size_) always return true, so own_zero_copydata will be true. So the data can not be copied from cpu to gpu after the first one.

Thank you for your help.

naibaf7 commented 8 years ago

@zazd You can try to change the code and force ZEROCOPY_SUPPORTED to FALSE and see if that fixes it?

zazd commented 8 years ago

Yes, I delete own_zero_copydata = true. And it is fixed. And I find that the newest CLBlast can fix this problem, too.

Thank you.