sct-pipeline / binder-example

Example of end-to-end processing pipeline: from Dicom data to MR quantitative results.
MIT License
1 stars 1 forks source link

Create a Jupyter notebook #2

Open jcohenadad opened 5 years ago

jcohenadad commented 5 years ago

For integration with Binder, batch_processing.sh should be converted into a Jupyter notebook. A few things to keep in mind while doing that:

mathieuboudreau commented 5 years ago

Create a branch for this: jupyter

And converted your shell script into a notebook, and saved it after running it once: https://mybinder.org/v2/gh/sct-pipeline/binder-example/jupyter?urlpath=lab/tree/batch_processing.ipynb

mathieuboudreau commented 5 years ago

One minor issue I encountered which you should be aware of: I couldn't add MPLBACKEND='Agg' to the environment, because this would overwrite the setting already established for Jupyter/IPython (MPLBACKEND='module://ipykernel.pylab.backend_inline')

This means that if you run sct_check_dependencies, the plot test will fail.

I think if you want to be able to add an SCT plot within the Jupyter environment, you might need to do something this instead? (but I haven't tested this for SCT yet):

import matplotlib 
matplotlib.use('agg')

Do you think that would work as an alternative to setting it in the OS environment for SCT?

mathieuboudreau commented 5 years ago

This link (http://mmcdan.github.io/posts/interacting-with-the-shell-via-jupyter-notebook/) has some good tips on how to pass variables from the python environment to the shell command magics. e.g. !echo {var}, but there's other alternatives too. Not 100% sure these will work for us, but might be worth exploring.

mathieuboudreau commented 5 years ago

Looks like that syntax works for a Jupyter Notebook running the Python kernel, but not the SoS one (not really an issue for us here, they'll just need to be switched in the notebook I saved).

capture d ecran 2019-02-16 a 11 57 31
jcohenadad commented 5 years ago

@mathieuboudreau Awesome progress! Thanks a lot! Regarding the mplbackend stuff, yes i agree about your suggestion. In fact, we already switch to 'agg' within SCT code (e.g. here). It might not be systematic though so let us know when it becomes a problem and we can add it in specific functions.

Regarding the SoS vs. py3k, it seems to work on my end (see below), or am I missing something?

screen shot 2019-02-16 at 11 38 27
mathieuboudreau commented 5 years ago

Yeah sorry, I might not have been clear enough.

Running the script as-is worked fine in SoS (and Python); it's passing Python variables into a shell command within the Jupyter Notebook that only works in Python (and not SoS).

e.g. This works in both:


# Download example data
!sct_download_data -d sct_example_data

# Go to MT folder
os.chdir('sct_example_data/mt/')

But this example where I define a string in a Python variable and pass it to the shell command only work with the Python kernel (see my image above):


var = 't1w.nii.gz'
!sct_deepseg_sc -i {var} -c t1
mathieuboudreau commented 5 years ago

So like, if you want to define output filenames and SCT options in python variables all the way on top of your script (easier for the user to define/change), and then pass them along into your shell commands with the syntax above, you'll need to use the Python kernel. But if you simply want to hardcode all those options in your shell commands (as you've done now), then either kernel works.

mathieuboudreau commented 5 years ago

One more tip.

If you make a change in a Jupyter Notebook (or any file in the repo), download it, commit and push, if you then go to a new MyBinder session, the changes might not be reflected on that session. This is because MyBinder caches the Docker image (i.e. the repo that's in the image was pulled prior to your new changes), and only generates a new one if there has been a change in yoru dockerfile. This is slightly annoying, but you'll have to remember to make a minor change in the step where the repo is pulled in the dockerfile. What I've been doing is simply adding a space before the backslash of this line in all my repos:

https://github.com/sct-pipeline/binder-example/blob/32837829fae414bc31bdddc3ecdd7b5afaa8fb19/Dockerfile#L50

If you commit and push that change to the repo (it's what I do), then the next time you go to MyBinder it will trigger a new build from that RUN command and those that follow it, so the repo with the new commits will get pulled, and you'll see them in your Jupyter session.

perone commented 5 years ago

What you can do is to use sh and a custom class to execute SCT commands, I made an example below:

import os
import sh
from pathlib import Path

class SCT(object):
    def __init__(self, sct_dir=None):
        if sct_dir is None:
            envdir = os.getenv("SCT_DIR")
            if envdir is None:
                raise RuntimeError("You need a proper SCT installation.")
            self.sct_root = Path(envdir)
        else:
            self.sct_root = Path(sct_dir)
        self._populate()

    def _populate(self):
        for fbin in (self.sct_root / "bin").glob("sct_*"):
            cmd = sh.Command(str(fbin.absolute()))
            setattr(self, fbin.name, cmd)

# Example using it
sct = SCT() # This will use the environment variable
output = sct.sct_download_data(d="sct_example_data")

sct = SCT("/mydir/sct") # This will use a custom directory
output = sct.sct_download_data(d="sct_example_data")

This piece of code can also be a separate PyPI module, so anyone who installed SCT can use this later after installing it like pip install sct-interface or something like that, because it is quite common this need to execute SCT tools from inside a Python interpreter.

jcohenadad commented 5 years ago

very elegant solution, thanks a lot @perone!

mathieuboudreau commented 5 years ago

@jcohenadad what's the best way to visualise the output images of your pipeline? Are there SCT python functions that are available to display *.nii images? (I looked for SCT documentation, but couldn't find anything except your github readme and forum). I tested out nilearn; can plot some images, however the function I used overlays your images on a brain, that's why I'm reaching out here to see if you have matplotlib-derived functions available too for this.

jcohenadad commented 5 years ago

yes, absolutely, i would strongly recommend to use the QC module, which produces html page like this one (after download, double-click on "index.html"). Happy to guide you through the use of this module so you can generate mosaic of png images-- it will force us to generate a more thorough doc

mathieuboudreau commented 5 years ago

I tried running the QC module for the first command in your script (running in a Jupyter Notebook),

!sct_deepseg_sc -i t1w.nii.gz -c t1 -qc ./qc

and enocuntered an error:

``` -- Spinal Cord Toolbox (master/fc2fc50f2061b1046ec53376fa371da014b5ac46) Running /home/jovyan/work/binder-example/sct/scripts/sct_deepseg_sc.py -i t1w.nii.gz -c t1 -qc ./qc Method: Centerline algorithm: svm Assumes brain section included in the image: True Dimension of the segmentation kernel convolutions: 2d Creating temporary folder... Reorient the image to RPI, if necessary... Finding the spinal cord centerline... Resample the image to 0.5 mm isotropic resolution... /home/jovyan/work/binder-example/sct/python/lib/python2.7/site-packages/nipy/io/files.py:145: FutureWarning: Default `strict` currently False; this will change to True in a future version of nipy ni_img = nipy2nifti(img, data_dtype = io_dtype) Cropping the image around the spinal cord... Normalizing the intensity... Segmenting the spinal cord using deep learning on 2D patches... Reassembling the image... WARNING: To avoid intensity overflow due to convertion to uint8, intensity will be rescaled to the maximum quantization scale. Resampling the segmentation to the original image resolution... Binarizing the segmentation to avoid interpolation effects... Remove temporary files... rm -rf /tmp/sct-20190221184045.940099-h7k6h8 Traceback (most recent call last): File "/home/jovyan/work/binder-example/sct/scripts/sct_deepseg_sc.py", line 196, in main() File "/home/jovyan/work/binder-example/sct/scripts/sct_deepseg_sc.py", line 190, in main generate_qc(im_image, im_seg, args, os.path.abspath(path_qc)) File "/home/jovyan/work/binder-example/sct/scripts/sct_deepseg_sc.py", line 93, in generate_qc import spinalcordtoolbox.reports.qc as qc File "/home/jovyan/work/binder-example/sct/spinalcordtoolbox/reports/qc.py", line 27, in import matplotlib.pyplot as plt File "/home/jovyan/work/binder-example/sct/python/lib/python2.7/site-packages/matplotlib/pyplot.py", line 114, in _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup() File "/home/jovyan/work/binder-example/sct/python/lib/python2.7/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup globals(),locals(),[backend_name],0) ImportError: No module named ipykernel.pylab.backend_inline Note: you can opt out of Sentry reporting by editing the file ${SCT_DIR}/bin/sct_launcher and delete the line starting with "export SENTRY_DSN" Total processing time: 0 min 11 s ```

It appears that the QC module (and/or matplotlib) still somehow "sees" the environment variable `MPLBACKEND` (IPython sets it to `module://ipykernel.pylab.backend_inline`), even though as you say it should already be set to `Agg` [here](https://github.com/neuropoly/spinalcordtoolbox/blob/master/spinalcordtoolbox/reports/qc.py#L24). Any thoughts on a way around this? Should I open an issue on SCT? I tried running `%matplotlib inline`, but I think this only works if python commands are being directly called in the cell (and not commandline calls within a cell using the `!` bash command, like I'm doing). It might not be an SCT problem, but moreso an SCT shell + Jupyter problem. Does anyone in your group have experience with SCT + Jupyter?

mathieuboudreau commented 5 years ago

Another hint: I think all instances that I found online by searching (ImportError: No module named ipykernel.pylab.backend_inline) were using Python 2.7 (and so are we, by the looks of where matplotlib was installed in the error above)

e.g. :

https://github.com/nilmtk/nilmtk/issues/469

https://github.com/Microsoft/PTVS/issues/1598

https://github.com/ipython/ipykernel/issues/173

https://github.com/arq5x/poretools/issues/131

jcohenadad commented 5 years ago

can this be a workaround to the matplotlib problem? https://github.com/sct-pipeline/binder-example/issues/2#issuecomment-464748942

mathieuboudreau commented 5 years ago

I couldn't get that far using that way; I was able to download the data like @perone's example, but got an error running the next command:

capture d ecran 2019-02-21 a 16 10 16 capture d ecran 2019-02-21 a 16 13 46 capture d ecran 2019-02-21 a 16 13 56

mathieuboudreau commented 5 years ago

Ah nevermind my last comment, I forgot to os.chdir in the data's directory, woops.

mathieuboudreau commented 5 years ago

So it works without the qc argument, but breaks with it. @perone Looks like it's giving a --qc flag instead of -qc?

capture d ecran 2019-02-21 a 16 30 01 capture d ecran 2019-02-21 a 16 30 09

p.s. If this is just another way of calling the commandline, how is this any different that doing it the previous way I suggested with the bash magic (`!`)?

perone commented 5 years ago

You can alternatively run as in:

output = sct.sct_download_data("-d", "sct_example_data")

So you can decide if using --d or -d for instance. For more options you can look at the sh documentation, there are many ways to call and interact with these functions. By default, parameters in linux command-line tools with more than one letter are prefixed with two dashes (--), that probably explains why sh is using two for the qc case.

The problem with bash command ! is that it is not Python code, but an extension of IPython and Jupyter.

mathieuboudreau commented 5 years ago

Great, that solved that issue @perone !

I still get the error running it with the QC flag though (I'm think it's the same as before; the error log is cut short because we're calling with sh instead of my ! command):

capture d ecran 2019-02-21 a 16 47 13 capture d ecran 2019-02-21 a 16 47 21
perone commented 5 years ago

The full stderr/stdout, etc, are stored on the exception. If you want to capture the entire output, just handle the exception:

try:
    output = sct.sct_deepseg_sc("-i", "t2.nii.gz", "-c", "t1", "-qc", "./qc")
except sh.ErrorReturnCode as ex:
    print(ex.stderr)

And if you want to do this for all, you can just add this code on a wrapper on the SCT class. Another way is to just monkey patch as well:

sh.ErrorReturnCode.truncate_cap = 2**64

After importing the sh module.

mathieuboudreau commented 5 years ago

yeah I tried that, but it outputs this:

capture d ecran 2019-02-21 a 17 12 11
perone commented 5 years ago

You can just decode the binary data (note the b in the beginning of the output) and it will print a nice string:

print(ex.stderr.decode("utf-8"))
mathieuboudreau commented 5 years ago

Right! Anyways, by reading the screenshot I showed, I still get the same matplotlib backend error as before using this way.

perone commented 5 years ago

Oka, so for the other problem: do you still get the same problem when you add this:

backend : MacOSX

into the ~/.matplotlib/matplotlibrc file ?

perone commented 5 years ago

So, I tested on binder and I was able to make it work by using the environment variable upon calling the process:

sct.sct_deepseg_sc("-i", "sct_example_data/mt/t1w.nii.gz", "-c", "t1", "-qc", "./qc",
                   _env={"MPLBACKEND": "agg"})

You can pass this variable in the SCT class as well to avoid writing it on every call.

mathieuboudreau commented 5 years ago

Amazing - that works for me too @perone ! Thanks for all your help!

mathieuboudreau commented 5 years ago

Just to be a completionist: the command line alternative solution (using the Jupyter ! magic command to call the bash) is:

!MPLBACKEND=agg sct_deepseg_sc -i t1w.nii.gz -c t1 -qc qc

jcohenadad commented 5 years ago

@mathieuboudreau yesterday we merged this PR https://github.com/neuropoly/spinalcordtoolbox/pull/2157, which implements the OO API of matplotlib. Now that the backend is manually selected, you won't need to declare MPLBACKEND anymore

mathieuboudreau commented 5 years ago

Great! I will test that out on the next Binder/Docker build I launch

mathieuboudreau commented 5 years ago

@jcohenadad I've updated the notebook (reminder of the link).

Changes:

At this point, maybe you could give offer guidance on what other plots (if any) you think would be useful to visualise in such a pipeline, how you want to display your final results, and some documentation (before the pipeline, at each step, and/or at the figures/results).

mathieuboudreau commented 5 years ago

@jcohenadad just pinging you to make sure my message above was received

mathieuboudreau commented 5 years ago

@jcohenadad small note: Hardcoding the repository name relative to the notebook in a parameter might not be the solution here:

Capture d’écran 2019-03-20 à 10 06 45

The notebook will break if the user has a different directory tree or forked it with a different name; it just did for me in this Jupyter Book repo (note the change in file location relative to the root of the repo). Maybe the best strategy would be to store the full folder path of notebook in a variable at the start, and use that here instead?

jcohenadad commented 5 years ago

@mathieuboudreau i didn't introduce these lines of code, but my understanding is that they are there to access the qc folder (as done here), not pointing to the repos. In fact, this line of code didn't work for me, which i why i created the if case for 'darwin' platform.

mathieuboudreau commented 5 years ago

Re: your reply, ah ha, ok!

So it looks like when the qc subfolders are created in Jupyter, it actually uses/adds only the parent folder of the Jupyter notebook, not the whole tree until the root of the repo. Compare the error of my attempt to resolve it vs the left panel:

Capture d’écran 2019-03-20 à 10 41 54
jcohenadad commented 5 years ago

i would strongly advise to feed absolute (not relative) path to the -qc flag. You could use e.g. os.join(os.path.abspath(os.getcwd()), 'qc') to declare qc_folder (before the os.chdir())

that would solve all our problems (and get rid of the ugly if case for darwin)

mathieuboudreau commented 5 years ago

I agree - I think I tried doing that at first; I vaguely remember something deterring me away from it but can't remember why. I'll try again and let you know if I encounter an issue.