janelia-flyem / NeuroProof

Tool for graph-based image segmentation and analysis
Other
21 stars 9 forks source link

segfault on agglomeration classifier training with multiple iterations #4

Open michielkleinnijenhuis opened 8 years ago

michielkleinnijenhuis commented 8 years ago

Hi there,

I'm confronted with a segfault on training the agglomeration classifier with multiple iterations. See output below, It occurs with the example dataset as well as my own data. When using <--strategy-type 1> or <--strategy-type 2 --num-iterations 1> both run fine. I'm on OSX 10.10.5, but the behaviour on Linux 2.6.32-279.5.1.el6.x86_64 is identical. Could you please have a look at why this occurs?

Thanks, Michiel

neuroproof_graph_learn \
training_sample2/oversegmented_stack_labels.h5 \
training_sample2/boundary_prediction.h5 \
training_sample2/groundtruth.h5 \
--classifier-name training_sample2/classifier_str2.xml \
--strategy-type 2 --num-iterations 2

ignore features: 

 ** Learning iteration 1  **

Learn edge classifier ...
Building RAG ...done with 3051 nodes
Inclusion removal ...done with 3051 nodes
gt label counting
computed contingency table
gt label determined for 3051 nodes
ignore features:0 39 40 49 55 95 110 140 141 149 150 158 159 165 185 190 
Features generated
Number of samples and dimensions: 13643, 175
Number of merge: 4840
Time required to learn RF: 53.00 sec
with training set accuracy :99.054
Classifier learned
accuracy = 99.0545
done with 3051 nodes

 ** Learning iteration 2  **

Learn edge classifier ...
cumulative learning, all

Building RAG ...done with 3051 nodes
Inclusion removal ...done with 3051 nodes
gt label counting
computed contingency table
gt label determined for 3051 nodes
Segmentation fault: 11
paragt commented 8 years ago

Hi Michiel,

Thanks for using Neuroproof. From a quick look, it looks like the learning function is trying to determine which features are not very informative using the feature value distribution. We are not using this trick and commented out the line invoking this function in the newer versions. Could you please check whether or not the follwoing line is commented/deleted in your BioPriors/StackLearnAlgs.cpp ?

// if (prune_feature) feature_mgr->find_useless_features(all_features);

Please delete any txt file that was created by this command as well.

Let me know if the error persists after removing this line. Note also that, if you are not using a mitochondria channel in your pixelwise prediction, the use_mito bool value should be false.

Thanks

--Toufiq

michielkleinnijenhuis commented 8 years ago

Hi Toufiq,

Thanks for you fast response. I hope the following gives you a bit more info…

Thanks, Michiel

(neuroproof)ws133:envs michielk$ conda list

packages in environment at /Users/michielk/anaconda/envs/neuroproof:

# boost 1.55.0 4 flyem cloog 0.18.0 0 defaults curl 7.43.0 1 defaults fftw 3.3.4 1 flyem freetype 2.5.2 2 http://repo.continuum.io/pkgs/free/osx-64/freetype-2.5.2-2.tar.bz2 gcc 4.8.2 5 defaults gmp 5.1.2 6 defaults hdf5 1.8.14 0 defaults isl 0.12.2 1 defaults jpeg 8d 1 jsoncpp 1.6.2 1 flyem krb5 1.13.2 0 defaults libdvid-cpp 0.1 np19py27_5 flyem libgcc 4.8.4 1 flyem libpng 1.6.17 0 http://repo.continuum.io/pkgs/free/osx-64/libpng-1.6.17-0.tar.bz2 libtiff 4.0.2 1 libxml2 2.9.2 0 http://repo.continuum.io/pkgs/free/osx-64/libxml2-2.9.2-0.tar.bz2 lz4 128 1 flyem mpc 1.0.1 0 defaults mpfr 3.1.2 0 defaults neuroproof 1.1 py27_9 flyem nose 1.3.7 py27_0 http://repo.continuum.io/pkgs/free/osx-64/nose-1.3.7-py27_0.tar.bz2 numpy 1.9.2 py27_0 http://repo.continuum.io/pkgs/free/osx-64/numpy-1.9.2-py27_0.tar.bz2 opencv 2.4.10.1 1 flyem openssl 1.0.1k 1 http://repo.continuum.io/pkgs/free/osx-64/openssl-1.0.1k-1.tar.bz2 pip 7.1.0 py27_0 defaults python 2.7.10 0 http://repo.continuum.io/pkgs/free/osx-64/python-2.7.10-0.tar.bz2 qt 4.8.6.99 1 flyem readline 6.2 2 setuptools 18.0.1 py27_0 defaults sqlite 3.8.4.1 1 http://repo.continuum.io/pkgs/free/osx-64/sqlite-3.8.4.1-1.tar.bz2 tk 8.5.18 0 http://repo.continuum.io/pkgs/free/osx-64/tk-8.5.18-0.tar.bz2 vigra 1.10 8_5dde887 flyem vtk 5.10.1.99 with_pyqt_5 flyem zlib 1.2.8 0 http://repo.continuum.io/pkgs/free/osx-64/zlib-1.2.8-0.tar.bz2

On 1 Mar 2016, at 14:47, paragt notifications@github.com<mailto:notifications@github.com> wrote:

Hi Michiel,

Thanks for using Neuroproof. From a quick look, it looks like the learning function is trying to determine which features are not very informative using the feature value distribution. We are not using this trick and commented out the line invoking this function in the newer versions. Could you please check whether or not the follwoing line is commented/deleted in your BioPriors/StackLearnAlgs.cpp ?

// if (prune_feature) feature_mgr->find_useless_features(all_features);

Please delete any txt file that was created by this command as well.

Let me know if the error persists after removing this line. Note also that, if you are not using a mitochondria channel in your pixelwise prediction, the use_mito bool value should be false.

Thanks

--Toufiq

— Reply to this email directly or view it on GitHubhttps://github.com/janelia-flyem/NeuroProof/issues/4#issuecomment-190751021.

paragt commented 8 years ago

Hi Michiel,

Glad to help you in your project, thanks for the info you provided. I forgot about the new conda installation. I made some changes in the neuroproof after I received your email and got some problems in the make test, we are working to fix those now. Please bug me within a couple of days in case I dont get back to you by then.

Thanks

--Toufiq

michielkleinnijenhuis commented 8 years ago

Okay, thanks. Your help is much appreciated.

Michiel

On 1 Mar 2016, at 17:42, paragt notifications@github.com<mailto:notifications@github.com> wrote:

Hi Michiel,

Glad to help you in your project, thanks for the info you provided. I forgot about the new conda installation. I made some changes in the neuroproof after I received your email and got some problems in the make test, we are working to fix those now. Please bug me within a couple of days in case I dont get back to you by then.

Thanks

--Toufiq

— Reply to this email directly or view it on GitHubhttps://github.com/janelia-flyem/NeuroProof/issues/4#issuecomment-190827976.

paragt commented 8 years ago

Hi Michiel,

I think the problem is fixed now. The instruction you can follow to check out the git repo and build yourself with conda are as follows. Please let me know if you have problem with this.

Thanks

--Toufiq PS: conda info --root is the folder where you save the environments within your conda folder.

# Set up a conda environment with all dependencies
conda create -n myenv -c flyem neuroproof
source activate myenv
PREFIX=$(conda info --root)/envs/myenv
export LD_LIBRARY_PATH=${PREFIX}/lib # Linux
export DYLD_FALLBACK_LIBRARY_PATH=${PREFIX}/lib # Mac

# Discard the downloaded binary; we'll build our own.
conda remove neuroproof

# Clone and build
git clone https://github.com/janelia-flyem/neuroproof
cd neuroproof
./configure-for-conda.sh ${PREFIX}
cd build
make -j4
make install
make test

(Edited to correct PREFIX as mentioned below.)

michielkleinnijenhuis commented 8 years ago

Hi Toufiq,

Regarding the install: All tests passed. However, to get the it going: 1) Your line in the email below should probably read (this is also in the README) PREFIX=$(conda info --root)/envs/myenv instead of PREFIX=$(conda info --root)/myenv 2) I had to use an adapted build.sh, as I’m on Xcode7 which does not include the requested MacOSX10.10.sdk (see output NP-bug_make_before-adapted-buildscript.output attached). May I suggest that you add the following two lines to your build.sh (as found in https://forums.developer.apple.com/thread/17334)? That worked for me. -DCMAKE_OSX_DEPLOYMENT_TARGET:STRING="" \ -DCMAKE_OSX_SYSROOT:STRING=/ \

best wishes, Michiel

On 1 Mar 2016, at 18:38, paragt notifications@github.com<mailto:notifications@github.com> wrote:

Hi Michiel,

I think the problem is fixed now. The instruction you can follow to check out the git repo and build yourself with conda are as follows. Please let me know if you have problem with this.

Thanks

--Toufiq PS: conda info --root is the folder where you save the environments within your conda folder.

Set up a conda environment with all dependencies

conda create -n myenv -c flyem neuroproof source activate myenv PREFIX=$(conda info --root)/myenv export LD_LIBRARY_PATH=${PREFIX}/lib # Linux export DYLD_FALLBACK_LIBRARY_PATH=${PREFIX}/lib # Mac

Discard the downloaded binary; we'll build our own.

conda remove neuroproof

Clone and build

git clone https://github.com/janelia-flyem/neuroproof cd neuroproof ./configure-for-conda.sh ${PREFIX} cd build make -j4 make install make test

— Reply to this email directly or view it on GitHubhttps://github.com/janelia-flyem/NeuroProof/issues/4#issuecomment-190845809.

thouis commented 8 years ago

We're seeing the same thing on Linux, at commit 23992fa424.

LeeKamentsky commented 7 years ago

I think the problem occurs because a deleted edge is being revisited here: https://github.com/janelia-flyem/NeuroProof/blob/master/src/Algorithms/FeatureJoinAlgs.h#L127

I patched the code with the following test at that line and it ran to completion past the segfault:

        if (((*iter)->get_node1() == node_remove) || 
           ((*iter)->get_node2() == node_remove))
           continue;

That should test for the edge that's been merged out and should avoid reinserting it into the queue and trying to use the deleted node_cache for node_remove.

stephenplaza commented 7 years ago

Thanks for your contribution Lee! If you want, please create a pull request and assuming the integration tests pass I will accept it.