General questions - Githubissues

guiwitz commented 2 years ago

I went a bit through the (great!) material and before trying to improve here and there, I have a few questions:

Is there anywhere a plan of the entire course (more detailed than what is explained here)? Maybe that could help structure the course a bit better. If I take the example of the segmentation chapter, there's a mix of basic notebooks, notebooks going through series of functions and notebooks showing complete workflows. Maybe there could be sub-chapters?
pyclesperanto: I noticed that in most notebooks introducing new concepts (e.g. thresholding) both scikit-image and pyclesperanto examples are given. I think it's great as it gives people a choice (it also guarantees that people can at least run the non-GPU code in any case). Some more advanced examples e.g. on nuclei quantification only exist with pyclesperanto. Would it be accepted (or desired) to have equivalent notebooks using just the classical packages (scikit-image, scipy etc.)?
the blobs picture is used in lots of places. Is that by design or was it just a first step? I think it makes it in general more interesting to show even simple things like labelling with real images.

A few things I mentioned in other issues, but as they are general questions I put them back here for completeness:

what's the plan about pyclesperanto's imshow. I just think it's a bit dangerous to depend on pyclesperanto just for image display (even in notebooks where pyclesperanto is not used). Again my (biased) opinion is to use microfilm.
the numbering both of chapters and of notebooks inside chapters is often strange. Do we avoid fixing this (so that the _toc.yml doesn't have to change all the time) until things settle down or should fix things bit by bit when we encounter them?
should the data in general be accessed by url or by paths? url's ensure that it always works, though with internet connection. Paths don't need a connection but only work when cloning the repo (tricky e.g. on Colab).
should we make the course work on Colab and/or on Binder?

haesleinhuepf commented 2 years ago

Hi @guiwitz ,

great questions!

Is there anywhere a plan of the entire course (more detailed than what is explained here)?

Not really (yet). As you saw correctly, it's more a collection of notebooks collected from other places and put together in a folder. Restructuring makes a lot of sense.

Some more advanced examples e.g. on nuclei quantification only exist with pyclesperanto. Would it be accepted (or desired) to have equivalent notebooks using just the classical packages (scikit-image, scipy etc.)?

I'm speculating that these more advanced notebooks use functions that do not exist in scikit-image. I've implemented plenty of stuff in pyclesperanto, but also in libraries such as napari-simpleitk-image-processing and napari-segment-blobs-and-things-with-membranes, because those things were not available in scikit-image. Voronoi-Otsu-Labeling is a famous example. Similar code using scikit-image is copied a billion times in notebooks, and I think we should avoid this here. We better improve existing libraries and show how to use them instead of writing complicated notebooks with scikit-image and scipy functions. Anyway, if you have a concrete example in mind, let me know! I'm curious.

the blobs picture is used in lots of places. Is that by design or was it just a first step? I think it makes it in general more interesting to show even simple things like labelling with real images.

That's a relict of my ImageJ-course materials. I'm more and more using CC-BY and CC0 licensed images from BBBC.

what's the plan about pyclesperanto's imshow. I just think it's a bit dangerous to depend on pyclesperanto just for image display (even in notebooks where pyclesperanto is not used). Again my (biased) opinion is to use microfilm.

Fully agreed. Microfilm is super cool and I'd like to use it more, too. I think the one or the other change to microfilm might be necessary though. I started this PR exploring microfilm and got stuck (mostly because of time budget reasons on my side).

the numbering both of chapters and of notebooks inside chapters is often strange. Do we avoid fixing this (so that the _toc.yml doesn't have to change all the time)

Not sure what you mean. Whenever a notebook is added, we need to change the toc anyway. We could remove numbers from notebook names, I agree. The chapter numbering is a bit sub-optimal, but without the numbering, the github repository is extremely unstructured. Thus, I'd vote for keeping the numbers in the chapters.

should the data in general be accessed by url or by paths? url's ensure that it always works, though with internet connection. Paths don't need a connection but only work when cloning the repo (tricky e.g. on Colab).

I think it would be cool to keep relative paths and not update to URLs. In that way, people can work with the notebooks in local copies without the need to re-download things all the time. Is there a way to make relative links work in colab?

should we make the course work on Colab and/or on Binder?

I think the majority of notebooks used pyclesperanto. This works on colab, but I haven't tried binder. How complicated are the installation instructions we would have to provide so that people can run the notebooks from binder and/or colab?

guiwitz commented 2 years ago

Similar code using scikit-image is copied a billion times in notebooks, and I think we should avoid this here. We better improve existing libraries and show how to use them instead of writing complicated notebooks with scikit-image and scipy functions. Anyway, if you have a concrete example in mind, let me know! I'm curious.

After reflection, what you say here makes a lot of sense, and to be frank I didn't consider that perspective before. So I agree, there's no need for long, complicated scikit-image notebooks. I was going to write that maybe one example teasing apart one of your functions would be helpful, but I just saw that you already did that for the Voronoi case using cle. So at most, maybe we could do something similar with one of the napari-segment-blobs-and-things-with-membranes functions?

On a more general note: the functions of napari-segment-blobs-and-things-with-membranes could be useful even for people not using napari. Have you considered pulling those out in a separate package independent of napari?

That's a relict of my ImageJ-course materials. I'm more and more using CC-BY and CC0 licensed images from BBBC.

Ok then I'll try to find alternatives whenever I stumble over the blobs. FYI I also find often useful images in the cell image library which has lots of CC-BY and CC0 stuff.

Not sure what you mean. Whenever a notebook is added, we need to change the toc anyway.

Numbering notebooks is fine. My remark was just about things like chapters numbered 1,2,3 and then 12 or notebooks in the segmentation chapter starting numbering at 6. I just wanted to know if I should fix these or if we leave it for later.

I think the one or the other change to microfilm might be necessary though.

Ok very good! FYI I updated microfilm in main so that by default single-channel images are rendered in gray-scale.

Is there a way to make relative links work in colab?

The only way is to run a little script that clones the repo and puts images in the right place. I also had at some point an automated script that would go through all notebooks in a repo and change all paths. This is not urgent, I also plan to use your material locally. I'll fix the places where I added links in my PRs.

This works on colab, but I haven't tried binder. How complicated are the installation instructions we would have to provide so that people can run the notebooks from binder and/or colab?

Right, no GPUs on Binder, so I guess there's no way this will work. FYI otherwise the whole installation is done for users (using conda, pip etc or apt-get for other things) so they normally don't have to worry about installing anything.

haesleinhuepf / BioImageAnalysisNotebooks

General questions #18