Open jcohenadad opened 5 years ago
Create a branch for this: jupyter
And converted your shell script into a notebook, and saved it after running it once: https://mybinder.org/v2/gh/sct-pipeline/binder-example/jupyter?urlpath=lab/tree/batch_processing.ipynb
One minor issue I encountered which you should be aware of: I couldn't add MPLBACKEND='Agg'
to the environment, because this would overwrite the setting already established for Jupyter/IPython (MPLBACKEND='module://ipykernel.pylab.backend_inline'
)
This means that if you run sct_check_dependencies
, the plot test will fail.
I think if you want to be able to add an SCT plot within the Jupyter environment, you might need to do something this instead? (but I haven't tested this for SCT yet):
import matplotlib
matplotlib.use('agg')
Do you think that would work as an alternative to setting it in the OS environment for SCT?
This link (http://mmcdan.github.io/posts/interacting-with-the-shell-via-jupyter-notebook/) has some good tips on how to pass variables from the python environment to the shell command magics. e.g. !echo {var}
, but there's other alternatives too. Not 100% sure these will work for us, but might be worth exploring.
Looks like that syntax works for a Jupyter Notebook running the Python kernel, but not the SoS one (not really an issue for us here, they'll just need to be switched in the notebook I saved).
@mathieuboudreau Awesome progress! Thanks a lot! Regarding the mplbackend stuff, yes i agree about your suggestion. In fact, we already switch to 'agg' within SCT code (e.g. here). It might not be systematic though so let us know when it becomes a problem and we can add it in specific functions.
Regarding the SoS vs. py3k, it seems to work on my end (see below), or am I missing something?
Yeah sorry, I might not have been clear enough.
Running the script as-is worked fine in SoS (and Python); it's passing Python variables into a shell command within the Jupyter Notebook that only works in Python (and not SoS).
e.g. This works in both:
# Download example data
!sct_download_data -d sct_example_data
# Go to MT folder
os.chdir('sct_example_data/mt/')
But this example where I define a string in a Python variable and pass it to the shell command only work with the Python kernel (see my image above):
var = 't1w.nii.gz'
!sct_deepseg_sc -i {var} -c t1
So like, if you want to define output filenames and SCT options in python variables all the way on top of your script (easier for the user to define/change), and then pass them along into your shell commands with the syntax above, you'll need to use the Python kernel. But if you simply want to hardcode all those options in your shell commands (as you've done now), then either kernel works.
One more tip.
If you make a change in a Jupyter Notebook (or any file in the repo), download it, commit and push, if you then go to a new MyBinder session, the changes might not be reflected on that session. This is because MyBinder caches the Docker image (i.e. the repo that's in the image was pulled prior to your new changes), and only generates a new one if there has been a change in yoru dockerfile. This is slightly annoying, but you'll have to remember to make a minor change in the step where the repo is pulled in the dockerfile. What I've been doing is simply adding a space before the backslash of this line in all my repos:
If you commit and push that change to the repo (it's what I do), then the next time you go to MyBinder it will trigger a new build from that RUN
command and those that follow it, so the repo with the new commits will get pulled, and you'll see them in your Jupyter session.
What you can do is to use sh
and a custom class to execute SCT commands, I made an example below:
import os
import sh
from pathlib import Path
class SCT(object):
def __init__(self, sct_dir=None):
if sct_dir is None:
envdir = os.getenv("SCT_DIR")
if envdir is None:
raise RuntimeError("You need a proper SCT installation.")
self.sct_root = Path(envdir)
else:
self.sct_root = Path(sct_dir)
self._populate()
def _populate(self):
for fbin in (self.sct_root / "bin").glob("sct_*"):
cmd = sh.Command(str(fbin.absolute()))
setattr(self, fbin.name, cmd)
# Example using it
sct = SCT() # This will use the environment variable
output = sct.sct_download_data(d="sct_example_data")
sct = SCT("/mydir/sct") # This will use a custom directory
output = sct.sct_download_data(d="sct_example_data")
This piece of code can also be a separate PyPI module, so anyone who installed SCT can use this later after installing it like pip install sct-interface
or something like that, because it is quite common this need to execute SCT tools from inside a Python interpreter.
very elegant solution, thanks a lot @perone!
@jcohenadad what's the best way to visualise the output images of your pipeline? Are there SCT python functions that are available to display *.nii images? (I looked for SCT documentation, but couldn't find anything except your github readme and forum). I tested out nilearn; can plot some images, however the function I used overlays your images on a brain, that's why I'm reaching out here to see if you have matplotlib-derived functions available too for this.
yes, absolutely, i would strongly recommend to use the QC module, which produces html page like this one (after download, double-click on "index.html"). Happy to guide you through the use of this module so you can generate mosaic of png images-- it will force us to generate a more thorough doc
I tried running the QC module for the first command in your script (running in a Jupyter Notebook),
!sct_deepseg_sc -i t1w.nii.gz -c t1 -qc ./qc
and enocuntered an error:
It appears that the QC module (and/or matplotlib) still somehow "sees" the environment variable `MPLBACKEND` (IPython sets it to `module://ipykernel.pylab.backend_inline`), even though as you say it should already be set to `Agg` [here](https://github.com/neuropoly/spinalcordtoolbox/blob/master/spinalcordtoolbox/reports/qc.py#L24). Any thoughts on a way around this? Should I open an issue on SCT? I tried running `%matplotlib inline`, but I think this only works if python commands are being directly called in the cell (and not commandline calls within a cell using the `!` bash command, like I'm doing). It might not be an SCT problem, but moreso an SCT shell + Jupyter problem. Does anyone in your group have experience with SCT + Jupyter?
Another hint: I think all instances that I found online by searching (ImportError: No module named ipykernel.pylab.backend_inline
) were using Python 2.7 (and so are we, by the looks of where matplotlib was installed in the error above)
e.g. :
https://github.com/nilmtk/nilmtk/issues/469
https://github.com/Microsoft/PTVS/issues/1598
can this be a workaround to the matplotlib problem? https://github.com/sct-pipeline/binder-example/issues/2#issuecomment-464748942
I couldn't get that far using that way; I was able to download the data like @perone's example, but got an error running the next command:
Ah nevermind my last comment, I forgot to os.chdir in the data's directory, woops.
So it works without the qc
argument, but breaks with it. @perone Looks like it's giving a --qc
flag instead of -qc
?
p.s. If this is just another way of calling the commandline, how is this any different that doing it the previous way I suggested with the bash magic (`!`)?
You can alternatively run as in:
output = sct.sct_download_data("-d", "sct_example_data")
So you can decide if using --d
or -d
for instance. For more options you can look at the sh documentation, there are many ways to call and interact with these functions. By default, parameters in linux command-line tools with more than one letter are prefixed with two dashes (--
), that probably explains why sh
is using two for the qc
case.
The problem with bash command !
is that it is not Python code, but an extension of IPython and Jupyter.
Great, that solved that issue @perone !
I still get the error running it with the QC flag though (I'm think it's the same as before; the error log is cut short because we're calling with sh
instead of my !
command):
The full stderr
/stdout
, etc, are stored on the exception. If you want to capture the entire output, just handle the exception:
try:
output = sct.sct_deepseg_sc("-i", "t2.nii.gz", "-c", "t1", "-qc", "./qc")
except sh.ErrorReturnCode as ex:
print(ex.stderr)
And if you want to do this for all, you can just add this code on a wrapper on the SCT
class. Another way is to just monkey patch as well:
sh.ErrorReturnCode.truncate_cap = 2**64
After importing the sh
module.
yeah I tried that, but it outputs this:
You can just decode the binary data (note the b
in the beginning of the output) and it will print a nice string:
print(ex.stderr.decode("utf-8"))
Right! Anyways, by reading the screenshot I showed, I still get the same matplotlib backend error as before using this way.
Oka, so for the other problem: do you still get the same problem when you add this:
backend : MacOSX
into the ~/.matplotlib/matplotlibrc
file ?
So, I tested on binder and I was able to make it work by using the environment variable upon calling the process:
sct.sct_deepseg_sc("-i", "sct_example_data/mt/t1w.nii.gz", "-c", "t1", "-qc", "./qc",
_env={"MPLBACKEND": "agg"})
You can pass this variable in the SCT class as well to avoid writing it on every call.
Amazing - that works for me too @perone ! Thanks for all your help!
Just to be a completionist: the command line alternative solution (using the Jupyter !
magic command to call the bash) is:
!MPLBACKEND=agg sct_deepseg_sc -i t1w.nii.gz -c t1 -qc qc
@mathieuboudreau yesterday we merged this PR https://github.com/neuropoly/spinalcordtoolbox/pull/2157, which implements the OO API of matplotlib. Now that the backend is manually selected, you won't need to declare MPLBACKEND anymore
Great! I will test that out on the next Binder/Docker build I launch
@jcohenadad I've updated the notebook (reminder of the link).
Changes:
False
clears the output of the cells, except for figures, but currently True
for debugging)At this point, maybe you could give offer guidance on what other plots (if any) you think would be useful to visualise in such a pipeline, how you want to display your final results, and some documentation (before the pipeline, at each step, and/or at the figures/results).
@jcohenadad just pinging you to make sure my message above was received
@jcohenadad small note: Hardcoding the repository name relative to the notebook in a parameter might not be the solution here:
The notebook will break if the user has a different directory tree or forked it with a different name; it just did for me in this Jupyter Book repo (note the change in file location relative to the root of the repo). Maybe the best strategy would be to store the full folder path of notebook in a variable at the start, and use that here instead?
@mathieuboudreau i didn't introduce these lines of code, but my understanding is that they are there to access the qc folder (as done here), not pointing to the repos. In fact, this line of code didn't work for me, which i why i created the if case for 'darwin' platform.
Re: your reply, ah ha, ok!
So it looks like when the qc
subfolders are created in Jupyter, it actually uses/adds only the parent folder of the Jupyter notebook, not the whole tree until the root of the repo. Compare the error of my attempt to resolve it vs the left panel:
i would strongly advise to feed absolute (not relative) path to the -qc
flag. You could use e.g. os.join(os.path.abspath(os.getcwd()), 'qc')
to declare qc_folder
(before the os.chdir()
)
that would solve all our problems (and get rid of the ugly if case for darwin)
I agree - I think I tried doing that at first; I vaguely remember something deterring me away from it but can't remember why. I'll try again and let you know if I encounter an issue.
For integration with Binder,
batch_processing.sh
should be converted into a Jupyter notebook. A few things to keep in mind while doing that:dcm2niix
will be to be called as system UNIX call