nipy / mindboggle

Automated anatomical brain label/shape analysis software (+ website)
http://mindboggle.info
Other
145 stars 54 forks source link

nipype pipeline repeats steps when re-running mindboggle #38

Closed binarybottle closed 10 years ago

binarybottle commented 10 years ago

Arno Klein: after running the mindboggle nipype pipeline on the 101 brains, i found that when i try to re-run the nipype pipeline, it takes a long time. it skips a lot of steps, as it should, but then it gets to labeling and table-generating steps and then slows down as it repeats steps it already ran earlier.

Satrajit Ghosh: set: workflow.config['execution']['stop_on_first_rerun'] = True and then do workflow.run() the node that reruns will have a diff. this is used to debug your situation. one of the likely causes is that the output of some node is overwriting the input to another node. it creates a diff of the json hash file of a node to tell you what aspect changed. normally the contents of the node's working directory are obliterated before a run.

binarybottle commented 10 years ago

dhcp149114:mindboggle arno$ nipype_display_crash /Users/arno/Projects/Mindboggle/mindboggle/mindboggle/crash-20140205-020714-arno-mgh_to_nifti.a0.npz

File: /Users/arno/Projects/Mindboggle/mindboggle/mindboggle/crash-20140205-020714-arno-mgh_to_nifti.a0.npz Node: Mindboggle.mgh_to_nifti.a0 Working directory: /Users/arno/mindboggle_working/Mindboggle/_subject_OASIS-TRT-20-1/mgh_to_nifti

Node inputs:

function_str = S'def convert_mgh_to_native_nifti(input_file, reference_file, output_file=\'\',\n interp=\'nearest\'):\n """\n Convert volume from FreeSurfer \'unconformed\' to original space\n in nifti format using FreeSurfer\'s mri_vol2vol.\n\n Parameters\n ----------\n input_file : string\n input file name\n reference_file : string\n file in original space\n output_file : string\n name of output file\n interp : string\n interpolation method {trilin, nearest}\n\n Returns\n -------\n output_file : string\n name of output file\n\n """\n import os\n\n from mindboggle.utils.utils import execute\n\n # Convert volume from FreeSurfer to original space:\n print("Convert volume from FreeSurfer \'unconformed\' to original space...")\n\n if not output_file:\n output_file = os.path.join(os.getcwd(),\n os.path.basename(input_file).split(\'mgz\')[0] + \'nii.gz\')\n\n cmd = [\'mri_vol2vol\',\n \'--mov\', input_file,\n \'--targ\', reference_file,\n \'--interp\', interp,\n \'--regheader --o\', output_file]\n execute(cmd)\n if not os.path.exists(output_file):\n raise(IOError(output_file + " not found"))\n output_file = output_file\n\n if not os.path.exists(output_file):\n raise(IOError(output_file + " not found"))\n\n return output_file\n' . ignore_exception = False input_file = /appsdir/freesurfer/subjects/OASIS-TRT-20-1/mri/orig/001.mgz interp = trilin output_file = reference_file = /appsdir/freesurfer/subjects/OASIS-TRT-20-1/mri/orig/001.mgz

Traceback: Traceback (most recent call last): File "//anaconda/lib/python2.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 18, in run_node result['result'] = node.run(updatehash=updatehash) File "//anaconda/lib/python2.7/site-packages/nipype/pipeline/engine.py", line 1357, in run raise Exception(("Cannot rerun when 'stop_on_first_rerun' " Exception: Cannot rerun when 'stop_on_first_rerun' is set to True

satra commented 10 years ago

there should be two json files in this directory: /Users/arno/mindboggle_working/Mindboggle/subjectOASIS-TRT-20-1/mgh_to_nifti

one of them is going to be a diff.

binarybottle commented 10 years ago

There's only one json file in that directory, and it isn't a diff file. What should I do?

binarybottle commented 10 years ago

I started from scratch, and find that it re-runs the final step -- creating the shape tables, but I still don't know why:

140206-02:51:53,225 workflow ERROR: ['Node Shape_tables.a0.a0 failed to run on host 1.0.0.127.in-addr.arpa.'] 140206-02:51:53,226 workflow INFO: Saving crash info to /Users/arno/Projects/Mindboggle/mindboggle/mindboggle/crash-20140206-025153-arno-Shape_tables.a0.a0.npz 140206-02:51:53,226 workflow INFO: Traceback (most recent call last): File "//anaconda/lib/python2.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 18, in run_node result['result'] = node.run(updatehash=updatehash) File "//anaconda/lib/python2.7/site-packages/nipype/pipeline/engine.py", line 1357, in run raise Exception(("Cannot rerun when 'stop_on_first_rerun' " Exception: Cannot rerun when 'stop_on_first_rerun' is set to True

binarybottle commented 10 years ago

When I either run OASIS-TRT-20-3 from scratch, or when I repeat the run, I get the above 'stop_on_first_rerun' error at the function fetch_ants_data.

satra commented 10 years ago

could you reproduce this in the nipype vagrant vm? and then box the vm? might be a good thing to do with @nicholsn .

nicholsn commented 10 years ago

Happy to help with if needed... Been working on provisioning the past few days, so just let me know

binarybottle commented 10 years ago

Thank you both. Curious, but after rerunning 94 of the 101 brains, only two repeat nodes. I test code on these two routinely -- it is possible that something else is stored or lost from the use of earlier versions of nipype, from repeated testing, etc.? I worked on a Vagrantfile-generating script yesterday, and I can't seem to "make" the C++ code, either from the script or after ssh'ing in:

[ 53%] Building CXX object travel_depth/CMakeFiles/TravelDepthMain.dir/TravelDepthMain.cpp.o Linking CXX executable TravelDepthMain /usr/bin/ld: warning: libGL.so.1, needed by /home/vagrant/anaconda/lib/libvtkHybrid.so.5.10.1, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libXt.so.6, needed by /home/vagrant/anaconda/lib/libvtkRendering.so.5.10.1, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libSM.so.6, needed by /home/vagrant/anaconda/lib/libvtkRendering.so.5.10.1, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libICE.so.6, needed by /home/vagrant/anaconda/lib/libvtkRendering.so.5.10.1, not found (try using -rpath or -rpath-link) /home/vagrant/anaconda/lib/libvtkRendering.so.5.10.1: undefined reference to glColorMaterial' /home/vagrant/anaconda/lib/libvtkRendering.so.5.10.1: undefined reference toglXMakeCurrent' /home/vagrant/anaconda/lib/libvtkRendering.so.5.10.1: undefined reference to `glDeleteLists' ... collect2: ld returned 1 exit status make[2]: * [travel_depth/TravelDepthMain] Error 1 make[1]: * [travel_depth/CMakeFiles/TravelDepthMain.dir/all] Error 2 make: *\ [all] Error 2

nicholsn commented 10 years ago

it looks like lib*.so are from X Windows... my guess is that the vagrant vm is a stripped down ubuntu server edition w/o x windows and vtk needs access to a screen for rendering​ . From http://askubuntu.com/questions/213678/how-to-install-x11-xorg, ssh into the box and run:

sudo apt-get install xorg openbox

binarybottle commented 10 years ago

I am sure you are right -- I was using this base box: config.vm.box_url = "http://files.vagrantup.com/precise64.box"

I tried "sudo apt-get install xorg openbox", but keep getting the error: Temporary failure resolving 'us.archive.ubuntu.com' Should I be adding mirrors to some file in the vm?

satra commented 10 years ago

are you bridging the network to the outside world? the name of the external interface will vary depending on the machine you are on.

nicholsn commented 10 years ago

did you "sudo apt-get update" first? If that fails and the vm is offline, then it might be the network interface as satra suggests...

On Mon, Feb 10, 2014 at 3:42 PM, Satrajit Ghosh notifications@github.comwrote:

are you bridging the network to the outside world? the name of the external interface will vary depending on the machine you are on.

— Reply to this email directly or view it on GitHubhttps://github.com/binarybottle/mindboggle/issues/38#issuecomment-34706820 .

binarybottle commented 10 years ago

I uncommented "config.vm.network :public_network" in the Vagrantfile, and now I am installing xorg and openbox...

binarybottle commented 10 years ago

When I 'vagrant up', it seems to install everything, but when I 'vagrant ssh', only some things are actually installed -- even pip and cmake disappear! Why would this be?

nicholsn commented 10 years ago

Hmmm... not sure. Can you send me a link to the Vagrantfile?

On Mon, Feb 10, 2014 at 8:41 PM, Arno Klein notifications@github.comwrote:

When I 'vagrant up', it seems to install everything, but when I 'vagrant ssh', only some things are actually installed -- even pip and cmake disappear! Why would this be?

— Reply to this email directly or view it on GitHubhttps://github.com/binarybottle/mindboggle/issues/38#issuecomment-34726042 .

binarybottle commented 10 years ago

TEST: I ran two subjects through mindboggle, s1 and s2, and there were no errors. Specifically, I ran s1, then ran s1 again with stop_on_first_rerun, and no errors. I did the same for s2. However, when I ran s1 again with stop_on_first_rerun, I got an error. When I looked at the nipype working directory, I found that there are five functions that do not have the subject name anywhere in its path.
I found out that all of them are downstream of Fetch_ants_data, which did not specify a subject. To fix this, I will have this function take in a subject name as an argument to force nipype to keep each subject's results separate.

satra commented 10 years ago

it sounds like you are not using a subject name as an iterable but running the workflow in the same working directory. using iterables gives you the necessary isolation. alternatively you can simply set your workflow directory to be named by a subject's name.

binarybottle commented 10 years ago

I have renamed the working directory and removed subject name as an iterable, and now I have a much better organized working directory (thanks, Satra!). All that's left in this issue is to figure out why vagrant is then isn't installing...

nicholsn commented 10 years ago

it looks like pip is installed, but you had line 12 commented out where you append to .bashrc... You are also missing an escape in front of the first dollar sign...

echo "export PATH=\$HOME/anaconda/bin:\$PATH" >> .bashrc

For cmake and other apt-get installs, you need to include the -y option to accept that its OK to install the software.

sudo apt-get install -y cmake

its still building right now, but made it past the previous steps where it failed...

On Wed, Feb 12, 2014 at 8:02 AM, Arno Klein notifications@github.comwrote:

The test was a success, though I have renamed the working directory (thanks, Satra!). All that's left in this issue is to figure out why vagrant is then isn't installing...

— Reply to this email directly or view it on GitHubhttps://github.com/binarybottle/mindboggle/issues/38#issuecomment-34883120 .

binarybottle commented 10 years ago

Thank you!