scverse / squidpy_notebooks

Tutorials for Squidpy
https://squidpy.readthedocs.io/en/stable/
MIT License
30 stars 17 forks source link

graph tutorials #12

Closed giovp closed 3 years ago

giovp commented 3 years ago

initial draft of the graph tutorials. Content wise is everything there, needs to make couple more changes but otherwise should be good to go as first draft.

Wanted to ask @michalk8 two things about formatting:

Thanks!

michalk8 commented 3 years ago

is it ok to add links like this key? cause I know in .rst it should be done differently right?

In .rst, you can `Link text \<link URL>`_ or :ref:`Name of some section`, see here

as you shared, I'm using this format to point to docs pages :func:squidpy.gr.interaction_matrix . For scanpy pages though, I'm using simply scanpy.pl.spatial, do you think I should add the link explicitly as well?

Yes, please try to link as many functions as possible. Note that for squidpy, the links won't work on this repo, since the API docs are on the main repo. This will work once we have read the docs up.

giovp commented 3 years ago

I add to ignore E501 (line too long) in flake8 cause it was flagging all the text sections

michalk8 commented 3 years ago

I add to ignore E501 (line too long) in flake8 cause it was flagging all the text sections

I must disagree - the lines are super long and it's really hard to read the source code. Why not split the line (visually, it will render the same)? I.e.

# foooooo bar

is the same as:

# foooooo
# bar

If you want to split the lines, you do it like:

# foooooo
#
# bar

(haven't tested this, but am 99% sure this is the case).

giovp commented 3 years ago

yes yes, realized wasn't a sound option. Must say that it's not been very straightforward to write notebooks but having to converting all the time.

Anyway, this should be it. Two reminaing things before merging this:

michalk8 commented 3 years ago

Must say that it's not been very straightforward to write notebooks but having to converting all the time.

I know, but I think it's worth the effor (nbconvert might help a bit)

I've noticed you're using this in the header:

sc.logging.print_header()
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
sq.__version__

@giovp I can add header/footer cells to the gallery (in the header, I'd keep the config), in the footer the version (I'd only expect scanpy and squidpy to be imported), see here.

Also 1 point: sections/headers in .rst are done as:

# line should be the same length as the text
My title
--------

My subtitle
===========

not sure if e.g. # # Visium (spatial statistics and graph) will render properly or can be linked to (but should be changed nonetheless).

michalk8 commented 3 years ago

examples to be linked somewhere here.

What do you mean?

michalk8 commented 3 years ago

Btw, I'm going to add spellcheck as well (tried it once in the main repo, but was too lazy to configure it, notebook repo will benefit more from this).

giovp commented 3 years ago

not sure if e.g. # # Visium (spatial statistics and graph) will render properly or can be linked to (but should be changed nonetheless).

ah good point, so the format # title -> big title does not work, and should only use underscore rights?

What do you mean?

that sometime it's worthwhile to link to squidpy_notebooks/examples in squidpy_notebooks/tutorials to keep the flow of the explanation. Should be possible right?

hspitzer commented 3 years ago

I added the image analysis part to the visium_hne notebook. I also fixed some formatting etc in this notebook, but it needs some more work.

I also tried to run the docs, and noticed that the notebook is not executed - the docs are generated, and I see the markdown but no results from the code cells. Is that the expected behaviour @michalk8 ?

michalk8 commented 3 years ago

I also tried to run the docs, and noticed that the notebook is not executed - the docs are generated, and I see the markdown but no results from the code cells. Is that the expected behaviour @michalk8 ?

Unfortunately, yes. But, if you think it's important (since notebooks are downloadable), I could add a step in the CI to run the notebooks. 1 problem is that this will inflate their size a bit (esp. if they have figures).

giovp commented 3 years ago

Unfortunately, yes. But, if you think it's important (since notebooks are downloadable), I could add a step in the CI to run the notebooks. 1 problem is that this will inflate their size a bit (esp. if they have figures).

I think this is fine for me to leave it out.

question: do we need the markdown flag # %% [markdown] on top of the each text section? I noticed this is not present in the examples

michalk8 commented 3 years ago

# %% marks a new notebooks cell, if you need subsections - definitely useful for tutorial, for examples, not so much, you'd do:

# %%
# This is another section header
# ------------------------------
#
# In the built documentation, it will be rendered as rST after the code above!
# This is also another code block.

(example taken from sphinx-gallery website).

hspitzer commented 3 years ago

Unfortunately, yes. But, if you think it's important (since notebooks are downloadable), I could add a step in the CI to run the notebooks. 1 problem is that this will inflate their size a bit (esp. if they have figures).

I think this is fine for me to leave it out.

I think having the results of the cells right there in the tutorial would be very important. Otherwise the reader cannot really understand what is happening. For example, after calculating features, I describe the clustering and compare different clusters - but if you cannot see the umap, you don't really understand what's going on. We can exclude it from CI if too costly (running tutorials might take longer), but having it in the final readthedocs would be important. And maybe a command to generate tutorials locally to check the output.

Re the size: it won't inflate the size much more than a couple of examples, right? Each example I have written so far has a figure output as well.

question: do we need the markdown flag # %% [markdown] on top of the each text section? I noticed this is not present in the examples

I think the [markdown] was created by the automatic conversion, I don't think you need it.

giovp commented 3 years ago

I think having the results of the cells right there in the tutorial would be very important. Otherwise the reader cannot really understand what is happening. For example, after calculating features, I describe the clustering and compare different clusters - but if you cannot see the umap, you don't really understand what's going on. We can exclude it from CI if too costly (running tutorials might take longer), but having it in the final readthedocs would be important. And maybe a command to generate tutorials locally to check the output.

but they will of course be generated by CI, but are not present in the local generated docs right? This is what I understood.

I think the [markdown] was created by the automatic conversion, I don't think you need it.

ok, will remove it from the next ones, thanks!

michalk8 commented 3 years ago

I think having the results of the cells right there in the tutorial would be very important. Otherwise the reader cannot really understand what is happening. For example, after calculating features, I describe the clustering and compare different clusters - but if you cannot see the umap, you don't really understand what's going on. We can exclude it from CI if too costly (running tutorials might take longer), but having it in the final readthedocs would be important. And maybe a command to generate tutorials locally to check the output.

Not sure if I follow correctly. The results of the examples/tutorials are always visible in the docs (such as figures, print statements, warnings), but they are only in RST format, not in the .ipynb files, which are downloadable. The downloadable .ipynb files have their output cells empty, but this is not what users see when they go the docs - they see the RST files. If you really want also to have the .ipynb files to match almost exacrly (especially in terms of output, such as figures) to the RST, it's doable (easy way: after the .ipynb files have been generated from the initial .py/RST files, just run them, hard way: do not run the .ipynb file, just inject the output code cells into them (we have all the info, but it's a bit scattered - files are on disk print/repr/warnings are in RST).

hspitzer commented 3 years ago

I think having the results of the cells right there in the tutorial would be very important. Otherwise the reader cannot really understand what is happening. For example, after calculating features, I describe the clustering and compare different clusters - but if you cannot see the umap, you don't really understand what's going on. We can exclude it from CI if too costly (running tutorials might take longer), but having it in the final readthedocs would be important. And maybe a command to generate tutorials locally to check the output.

Not sure if I follow correctly. The results of the examples/tutorials are always visible in the docs (such as figures, print statements, warnings), but they are only in RST format, not in the .ipynb files, which are downloadable. The downloadable .ipynb files have their output cells empty, but this is not what users see when they go the docs - they see the RST files. If you really want also to have the .ipynb files to match almost exacrly (especially in terms of output, such as figures) to the RST, it's doable (easy way: after the .ipynb files have been generated from the initial .py/RST files, just run them, hard way: do not run the .ipynb file, just inject the output code cells into them (we have all the info, but it's a bit scattered - files are on disk print/repr/warnings are in RST).

Ok, great! Sorry for the confusion. I understood your initial comment to mean that there are no outputs in the online docs. Good to know that there always will be outputs! For the downloadable .ippyb files I think its not necessary to have the outputs. The use can always look in the docs, or execute the notebook themselves.

However, this brings up my original question: when I ran tox -e docs on this branch and checked the resulting HTMLs I could not see any output in the docs. All I saw was markdown and unexecuted code cells. There were no images. I was wondering if that was intended or if there was an error in the code somewhere that I needed to fix (easily possible as there might have been syntax errors).

image

e.g. above, the call to sc.pl.spatial should result in an image.

michalk8 commented 3 years ago

However, this brings up my original question: when I ran tox -e docs on this branch and checked the resulting HTMLs I could not see any output in the docs.

This is really an unexpected behavior. Did you try cleaning the docs with tox -e clean-docs? I will try building the tutorials in this branch to see for myself.

I was wondering if that was intended or if there was an error in the code somewhere that I needed to fix (easily possible as there might have been syntax errors).

If there's an error during the .py file execution, the whole documentation build will stop, so it shouldn't be it.

hspitzer commented 3 years ago

However, this brings up my original question: when I ran tox -e docs on this branch and checked the resulting HTMLs I could not see any output in the docs.

This is really an unexpected behavior. Did you try cleaning the docs with tox -e clean-docs? I will try building the tutorials in this branch to see for myself.

I was wondering if that was intended or if there was an error in the code somewhere that I needed to fix (easily possible as there might have been syntax errors).

If there's an error during the .py file execution, the whole documentation build will stop, so it shouldn't be it.

@michalk8 any ideas why the output of the tutorials is not shown?

michalk8 commented 3 years ago

@hspitzer

I've just run it on image_tutorials branch and can see the output (screenshot below). I didn't even have to merge the master. I'd consider removing the .tox directory and running the tox -e docs.

2021-01-26T11:23:50+01:00

giovp commented 3 years ago

ok this was failing because of scanpy, changed requirements to use 1.7 rc

giovp commented 3 years ago

it's a different error than what's in master, but also related to the docs. @michalk8 whenever you have time, can you have a look? no rush, thank you!

michalk8 commented 3 years ago

@giovp on master, linting is failing during regen (I've enabled spell-checker there, should be really in the linting step).

Here, the execution of tutorial_visium_hne.py seems to fail - 1 worker seems to have crashed, will try some tricks from the main repo.

p.s. for consistency's sake, please let's use the percent syntax to mark new section # %% instead of ###.... It's much more readable (could/should be done in #27 ).

giovp commented 3 years ago

p.s. for consistency's sake, please let's use the percent syntax to mark new section # %% instead of ###.... It's much more readable (could/should be done in #27 ).

ok great, I'll start clean up today, thank you!

hspitzer commented 3 years ago

p.s. for consistency's sake, please let's use the percent syntax to mark new section # %% instead of ###.... It's much more readable (could/should be done in #27 ).

sorry for jumping in here @michalk8 : how do I automatically generate correct .py files from a notebook with the #%% syntax? I am currently using jupytext -to py:sphinx which uses the ##### syntax. Everything else I tried does not allow me to losslessly convert back and forth or does not create correct rst files after creating the docs. Couldn't we use ##### instead for everything? It does look quite readable in my editor - but maybe long lines are not too great.

giovp commented 3 years ago

tbh, I noticed the jupytext -to py:sphinx also fails sometimes, as it leaves around the .py files unusued quotes "". I found this both in my notebooks and in some I reviewed from nacho and anna... I feel at the end of the day minor adjustements are alawys needed ?

hspitzer commented 3 years ago

tbh, I noticed the jupytext -to py:sphinx also fails sometimes, as it leaves around the .py files unusued quotes "". I found this both in my notebooks and in some I reviewed from nacho and anna... I feel at the end of the day minor adjustements are alawys needed ?

Yes, this happens if you have two consecutive code blocks without a text block in between. This can be easily avoided by putting all code in one cell.

giovp commented 3 years ago

Yes, this happens if you have two consecutive code blocks without a text block in between. This can be easily avoided by putting all code in one cell.

damn, good to know! ok let's stick to ##### then ?

michalk8 commented 3 years ago

sorry for jumping in here @michalk8 : how do I automatically generate correct .py files from a notebook with the #%% syntax? I am currently using jupytext -to py:sphinx which uses the ##### syntax. Everything else I tried does not allow me to losslessly convert back and forth or does not create correct rst files after creating the docs. Couldn't we use ##### instead for everything? It does look quite readable in my editor - but maybe long lines are not too great.

What I do is run:

jupytext --to py:percent --opt notebook_metadata_filter=-kernelspec,-jupytext <your notebook>

this uses the percent style and removes the metadata from top.

giovp commented 3 years ago

What I do is run:

jupytext --to py:percent --opt notebook_metadata_filter=-kernelspec,-jupytext

ok so what shall we be using? no preferences for me @hspitzer @michalk8

giovp commented 3 years ago

CI fails again because a worker failed. Apparently this is a known issues in pytest-xidst and there seem to be no fixtures. https://github.com/pytest-dev/pytest-xdist/issues/466

will investigatemore

michalk8 commented 3 years ago

@giovp one option would be to remove -n auto from tox.ini, which sould disable running the tests in parallel on the CI (this is the default). Alt. would be to specify backend='threading' in the tutorial for function which use parallelization and not spawn processes. Should not matter though, since n_jobs=None=1.

I remember adding pytest-faulthandler to the main repo (was exp. the same issue I think, not sure), but reading https://pypi.org/project/pytest-faulthandler/, it should be already included.

giovp commented 3 years ago

@giovp one option would be to remove -n auto from tox.ini, which sould disable running the tests in parallel on the CI (this is the default).

I was about to try this indeed but

Alt. would be to specify backend='threading'

this could also be an option. In fact we do use multiprocessign for morans in that tutorial. I'll try this first

hspitzer commented 3 years ago

What I do is run:

jupytext --to py:percent --opt notebook_metadata_filter=-kernelspec,-jupytext

ok so what shall we be using? no preferences for me @hspitzer @michalk8

This does not work for me.

  1. I need to put a python docstring in the first cell (no markdown)
  2. comments at the beginning of code cells are rendered as rst cells. E.g. starting from image the command converts to
    
    # %% [markdown]
    # Image analysis and feature extraction

%%

cell segmentation

sq.im.segment_img(img, img_id="image", model_group='watershed', thresh=70, geq=False)

feature extraction

sq.im.calculate_image_features(adata, img, features=['summary', 'segmentation'], features_kwargs= {'segmentation': {'label_img_id': 'segmented_watershed'}})



which in turn is rendered as
<img width="876" alt="image" src="https://user-images.githubusercontent.com/13350159/106444182-b1b43900-647d-11eb-8ea5-a101206cb5a4.png">

Using `jupytext --to py:shpinx` works as expected and produces the desired result
<img width="879" alt="image" src="https://user-images.githubusercontent.com/13350159/106444386-f8099800-647d-11eb-91bc-e6c8b4411572.png">
michalk8 commented 3 years ago

This is because sphinx-gallery cannot distinguish between the text cell and code cell, in an ideal world, it would be:

# %%
# here, everything is an RST

# cell segmentation - this is a comment
sq.im.segment_img(img, img_id="image", model_group='watershed', thresh=70, geq=False)
# feature extraction
sq.im.calculate_image_features(adata, img, features=['summary', 'segmentation'],
                                features_kwargs= {'segmentation': {'label_img_id': 'segmented_watershed'}})

see https://sphinx-gallery.github.io/stable/syntax.html#embed-rst-in-your-example-python-files But this is a discrepency between jupytext/sphinx-gallery, so please use whatever's more convenient.

giovp commented 3 years ago

Alt. would be to specify backend='threading'

this didn't work

I,ll try disabling -n auto

giovp commented 3 years ago

see sphinx-gallery.github.io/stable/syntax.html#embed-rst-in-your-example-python-files But this is a discrepency between jupytext/sphinx-gallery, so please use whatever's more convenient.

ok let's settle to #####

giovp commented 3 years ago

it took frecking forever, but it finally worked! I'm gonna merge this, and open another PR to do the cleanup addressing some of the points in #27 . @hspitzer I think it's unrealistic that we finish all the listed examples in #14 , I'd suggest we merge what we have now also in #9 and start with deployment. We can make a new release after the paper is out where we finish all the examples.