gpufit / Gpufit

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA
MIT License
309 stars 91 forks source link

Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough #31

Closed caowencai closed 6 years ago

caowencai commented 6 years ago

To handle with the stack overflow problem, all my parameters in gpufit are initialized in the way like

float initial_parameters = new float [n_fits n_model_parameters] ();

Then I found the n_fits was limited in a extent : when the n_fits was large enough(like 1000,0000 in my case),the gpufitted output_parameters turned out to be all-zero, but it worked as expected when n_fits was set to 100,0000. It is hard to figure out. By the way, the PC memory is OK.

superchromix commented 6 years ago

Please post a sample program which reproduces the error.

caowencai commented 6 years ago

Finally get it. Reason: available gpu memory is set too narrow, so it is easy to get the error "maximum user info size exceeded" which should have been printed out explicitly in throw std::runtime_error("maximum user info size exceeded"). Solution: turn up available_gpumemory =double(free_bytes) 0.5 ` void Info::get_gpu_properties() at info.cu, line 14, available_gpumemory = std::size_t(double(free_bytes) 0.1);`

jkfindeisen commented 6 years ago

@adrianjp88 Should this happen? I just thought then it would use smaller chunks of fits, so as long as at least the data of one fit fits into the available GPU memory it should run fine, shouldn't it?

superchromix commented 6 years ago

@Gittry We cannot understand your issue without a complete example code which reproduces the problem.

caowencai commented 6 years ago

The example is pulled at https://github.com/gpufit/Gpufit/pull/33. The user_info data is at https://drive.google.com/open?id=1M4TnXf3TQex3LFeEkXArlTB5GYvDzIKY, 488 MB, which cannot be uploaded by pulling.

In line 14, info.cu, available_gpumemory =std::size_t(double(free_bytes) * 0.1) changing 0.1 to 0.7 solved my problem.

mscipio commented 6 years ago

Maybe it's not relevant, but am I wrong when I say that you are using user_info to pass the kernel your data, and data to pass the time vector (the same for all your measurements)? Because then inside the kernel you keep using user_info as independent variable (in the if-else block at the beginning ) and data as what you want to fit ...

mscipio commented 6 years ago

I have a question that is somehow related to this issue ... let me know if you prefer me to open a new issue.

I succeeded in implementing my compartmental model as per issues #27 and #30 and now I was experimenting with increasing number of parallel fits. What I discovered is that if I use a n_fit that is greater then my max_chunck_size, when the library tries to allocate gpu memory for the second chunck I get the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  invalid argument
Aborted

That is thrown by void GPUData::init at:

write(
        parameters_,
        &initial_parameters[chunk_index_*info_.max_chunk_size_*info_.n_parameters_],
        chunk_size_ * info_.n_parameters_);

Quite surprisingly, the exact same command run a couple of lines above to write "data_" to the GPU memory completes just fine.

I checked and this error happens with both my new model and with original models you implemented. Can it be an issue with my GPU? I would find it hard to believe but I spent the entire morning trying to find a solution without success, so far ...

superchromix commented 6 years ago

@mscipio Please post an example code which reproduces the error.

mscipio commented 6 years ago

That's not going to be easy because I am working on Linux, so I did some changes to the code to make it compile and it's no longer compatible with the version in this repo.

If you say that in your version you don't have an issue of this kind, I will try (T.T) to trace back all the differences hoping to find MY mistake along the way.

You don't need an example code from me to test it out with your code: just pick on of the examples (like Linear_Regression_Example.cpp and increase A LOT n_fit so that the problem doesn't fit you GPU in one chunk)

EDIT: I just saw that you merged your version with @jkfindeisen 's one, so maybe I really should go back and revert to a compatible version to be in line with you. Is the current version able to compile under linux?

superchromix commented 6 years ago

@mscipio It is not clear to me what is the error you are reporting. In the manuscript we have tested Gpufit with up to 10^8 fits per function call. This is significantly larger than the maximum number of fits that can be processed simultaneously on the GPU.

You are making modifications to the core of the Gpufit code, so introducing changes there could easily lead to bugs. Why do you need to make changes to Gpudata::Init?

mscipio commented 6 years ago

@superchromix Yes, you are right obviously, I am sorry for bothering.

The changes I made (I wasn't trying to modify Gpudata::Init, anyway) were meant to debug my new kernel (and just a C++ implementation was not enough). Now I just cloned back your current version of the library and will go on working on this one. I just checked and I don't have that issue with Linear_Regression_Example.cpp, so I guess it was something I made to cause the error.

I will check my new model in this current branch asap and eventually open a pull request if you are interested in having it. Thanks.

superchromix commented 6 years ago

@Gittry It looks like you are using the user_info incorrectly. Your experimental data should not be passed to Gpufit through the user_info parameter. The user_info should be used to store independent variables.

caowencai commented 6 years ago

@superchromix I follow the doc that

custom x positions for the data points of every fit, stored in user_info

The experimental data is just Unique X coordinate values for each fit stored in float type.

Then how to make clear that I get proper results when n_fits or available_gpumemory is set properly in some extent, otherwise, the output turns out all zero.

superchromix commented 6 years ago

@Gittry What data are you fitting? In your code, I see some values loaded from a file, and some that are set manually (hard coded) in the program. What is being loaded from the file? What are you storing in user_info?

caowencai commented 6 years ago

@superchromix The loaded data is served as X-coordinates ,which stored in user_info; the data parameter in the code, set manually, is served as Y-coordinates. Both are the sampling data and each sampling gets 8 points, 4 parameters to fit the curve in the formula y = ae^(bx)+ce^(dx), where a, b, c and d are parameters to fit.

At my issue, I set the original variable size at available_gpu_memory_ =std::size_t(double(free_bytes) * 0.1) in line 14, info.cu bt changing 0.1 to 0.7 and solved my problem. Otherwise, all output_parameters turns zero.

superchromix commented 6 years ago

We have updated the memory GPU memory management in the latest versions of Gpufit, to allow for larger user_info sizes. This should address this issue.