tingliu / randomforest-matlab

Automatically exported from code.google.com/p/randomforest-matlab
4 stars 3 forks source link

Segfault when call training a lot of times #21

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I call RF training more than 10000 times consequently or in parallel. During 
the random iteration around 10000 it always fails with segfault.

Try to execute
parfor i = 1:10000, classRF_train(features, cols, 20, 3);  end

to reproduce. I'm not sure if specific input is important, but it failed for 
various inputs (which were all quite large). One of them is attached.

The program leaks a bit, so it looks like there is no memory available, but it 
is not the case (there are still 5Gb of free memory when it fails). Probably 
64x issue?

Win7, Matlab 7.12, 12 Gb of RAM
The dump file is attached.

Original issue reported on code.google.com by shapoval...@gmail.com on 10 Nov 2011 at 5:09

Attachments:

GoogleCodeExporter commented 8 years ago
i think i know why this error happens. i am mixing a bit of matlabs mex memory 
allocs with c libraries allocs; usually matlab mex works fine until there are 
some allocs that are not freed correctly, and after a while matlab will just 
segfault. i think i have some allocs that are not deallocated correctly.

i will take a look at it and fix the bug. also, i found the solution of the bug 
from the other issue and will post that out too.

thanks a lot.

Original comment by abhirana on 11 Nov 2011 at 8:03

GoogleCodeExporter commented 8 years ago
hi 

can you checkout the svn source? it has the fix for the memory issues.

i ran for a few thousand iterations (parfor) with your dataset and parameters 
and it didnot crash. so i think it should be fine now. i used visual studio 
2010 for compiling on 64bit windows 7 and matlab 7.12 with 16gb ram.

thanks

Original comment by abhirana on 11 Nov 2011 at 10:13

GoogleCodeExporter commented 8 years ago
Hi, I checked for 15K iterations, and it worked well. Thanks!
It it won't work in my application, I'll let you know.

Original comment by shapoval...@gmail.com on 14 Nov 2011 at 8:19

GoogleCodeExporter commented 8 years ago
Hi,

I can't compile this new version (mxCallalloc, mxfree) on linux 64 bit system. 
Here is the error:
g++ -fpic -O2 -funroll-loops -msse3  -Wall -c src/classTree.cpp -o 
tempbuild/classTree.o 
src/classTree.cpp:42:20: error: matrix.h: No such file or directory
src/classTree.cpp: In function ‘void catmax_(double*, double*, double*, int*, 
int*, int*, double*, int*, int*, int*, int*)’:
src/classTree.cpp:102: error: ‘mxCalloc’ was not declared in this scope
src/classTree.cpp:145: error: ‘mxFree’ was not declared in this scope
src/classTree.cpp: In function ‘void predictClassTree(double*, int, int, 
int*, int*, double*, int*, int*, int, int*, int, int*, int*, int)’:
src/classTree.cpp:226: error: ‘mxCalloc’ was not declared in this scope
src/classTree.cpp:258: error: ‘mxFree’ was not declared in this scope

Thanks,
MJ

Original comment by m.se...@gmail.com on 14 Mar 2012 at 8:44

GoogleCodeExporter commented 8 years ago
hi m.seyed

could you try out the latest source? i fixed the makefile.

do tell if you still have any issues.

thanks for letting me know about the bug.

Original comment by abhirana on 14 Mar 2012 at 9:16

GoogleCodeExporter commented 8 years ago
Hi Abhirana,

Thanks for the quick reponse. It is now working.

Best,
MJ

Original comment by m.se...@gmail.com on 14 Mar 2012 at 9:21

GoogleCodeExporter commented 8 years ago
Hi,

Still when I'm trying to train a random forest model on a 2000000 by 2000 
matrix I get a segmentation fault error in matlab. I just thought to report the 
problem here.
My inputs are integer numbers( I changed them to double type before passing to 
random forest) and the range varies from 0 to 3500.

Best,
MJ

Original comment by m.se...@gmail.com on 14 Mar 2012 at 10:22

GoogleCodeExporter commented 8 years ago
Hi MJ

do you have enough RAM? i am guessing your dataset requires atleast 32GB just 
for storing the array and another 32-50GB for the internal working of RF 
(considering 8 bytes for a double and 4 billion examples.

also this might just be too large of a dataset for RF to handle and finish in a 
reasonable time.

Original comment by abhirana on 14 Mar 2012 at 10:34

GoogleCodeExporter commented 8 years ago
Yes, I have 2TB RAM otherwise I would get memory error from matlab.
I agree that this one is a huge dataset but I was curious about the performance 
of RF with only few trees (say 5 trees).

Anyway, thanks for your code. I tried it on other smaller datasets and it 
worked fine.

Best,
MJ

Original comment by m.se...@gmail.com on 14 Mar 2012 at 10:38

GoogleCodeExporter commented 8 years ago
@MJ

sorry i missed your post

yup, RF would work but i dont think this package will finish in any appreciable 
time; currently its non-threaded both at the tree level and node level; 
multi-threading is on my todo list. maybe a version of RF threaded at 
node-level should be able to scale to your dataset. and maybe its not that bad 
with a few trees. e.g. kinect uses a version of RF threaded at node-level 
http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf

regards

Original comment by abhirana on 20 Mar 2012 at 11:17

GoogleCodeExporter commented 8 years ago
This is an interesting discussion. I am wondering, is there a rule of thumb to 
know what amounts of memory the Matlab function allocates for its internal 
needs?

That is, what is the amount of required memory in function of number of samples 
and their dimensionality?

Thanks for you work and this great package!

Cheers!

Original comment by vladisla...@gmail.com on 28 Mar 2012 at 10:31

GoogleCodeExporter commented 8 years ago
Hi vladislavs

atleast twice that of the training data. slightly more due to some temporary 
variables. but there are 6 more variables that store the tree heirarchy and 
that may consume more space than necessary. each of these 6 variables are of 
size ntree x nrnodes. where nrnodes=2*n+1. so the total mem requirement is 
2xNxD + (ntree)x(2*N+1), N=number of examples, D=num features

for regression, the data is bagged and thus creates a shadow copy of the 
training data; then sorting is done for each feature and thus regression scales 
as nlog(n) in terms of compute time. for classification, a presorting step is 
done for each feature and that helps classification to scale as n, but it still 
creates a shadow copy of the training data. (n=number of examples). a todo is 
to make regression do that presorting step to allow it to scale in n for 
execution time

Original comment by abhirana on 29 Mar 2012 at 8:42

GoogleCodeExporter commented 8 years ago
Hi Abhirana,

Thanks for your answer - it clears out the matters! Should note this somewhere!

Keep up the good work!

Original comment by vladisla...@gmail.com on 29 Mar 2012 at 9:43

GoogleCodeExporter commented 8 years ago
Abhishek,
Do you have plans on releasing the fixed code?

Thanks, 
—R

Original comment by romashap...@gmail.com on 5 Mar 2013 at 10:46

GoogleCodeExporter commented 8 years ago
@romashapovalov

the code in the svn is the latest code. i just haven't put up a download link.
a zip file is available here 
https://code.google.com/p/randomforest-matlab/issues/detail?id=41#c8

Original comment by abhirana on 9 Mar 2013 at 7:15