scverse / scvi-tools

Deep probabilistic analysis of single-cell and spatial omics data
http://scvi-tools.org/
BSD 3-Clause "New" or "Revised" License
1.17k stars 342 forks source link

I made a notebook for vanilla DE with interactive plots #526

Closed Munfred closed 4 years ago

Munfred commented 4 years ago

Hello, following after the basic scVI tutorial I made a notebook that performs vanilla DE and makes interactive plot. It uses Plotly instead of scanpy.

It makes the following plots for a dataset of 89k C. elegans cells:

I tried to make it as clear as possible and include a discussion of the vanilla and the change DE modes. I thought that maybe this could be helpful for an upcoming tutorial on the change mode ;)

Here's the notebook on Colab (for some reason nbviewer formatting was broken). Colab has the advantadge that you can run the notebook right away, perhaps something to keep in mind when doing new tutorials. For this notebook, with 1000 genes and 89k cells I only do 5 epochs (7 min total) and it trains very quickly. The slowest part is the t-SNE with openTSNE!

https://colab.research.google.com/drive/1hF7KSujhhHcyxzWkzAHy9lazXLexainr

If you have any comments (or find something I did wrong/could be improved) I'd be happy to hear. Hopefully this can be useful!

PierreBoyeau commented 4 years ago

Hi, Thank you so much for sharing your notebook. We indeed planned to do a differential expression tutorial notebook, and I feel that your work would be a perfect start. We considered allowing to run the notebooks on colab, but it can quickly get tricky (the fact that you need to restart notebooks after installs or disconnections) Let me get back to you soon on this!

romain-lopez commented 4 years ago

This is really good ! What did you have in mind @Munfred ? Can we provide feedback and aim to add it to the readme when ready ?

I agree having one example on colab would be nice, then the notebooks in the codebase can be for people who want more control over what they are doing.

Munfred commented 4 years ago

@romain-lopez Yes you can absolutely add it to the readme when ready, I'm happy to have it released as public domain/MIT/whatever terms apply.

In terms of improvements to the notebook, other than adding the change mode once it's released, I was mainly thinking of stylistic improvements. For example, I loaded the worm data I used from an anndata file but I'm not sure I did it the most "elegant" way.

The reason why I made the notebook with Plotly with this worm dataset is because I started working with the people that run WormBase from Caltech (https://wormbase.org/). WormBase is a biological knowledge base for C. elegans data, and it features not only experimental data like RNA-seq and microarray, but also carefully curated annotations of results as described in articles.

WormBase does not yet feature single cell RNA-seq data (in fact there are only 3 single cell papers on C. elegans that I know of), and so I am working on a prototype visualization tool for what could be offered. My idea is to use scVI to enable integration and comparison across all published C. elegans single cell datasets (since there are so few, it's actually pretty feasible!). Performing differential expression analysis is perhaps the main thing people care about in C. elegans experiments, which is why I'm very excited by @PierreBoyeau recent work.

For now, I have turned the visualizations in the notebook into a Plotly Dash app that you can see here: http://dash.wormcells.com/

This only integrates experiments from one paper for now, but if people like it then WormBase will incorporate it in some way over the next year.

What is more, WormBase is part of a bigger consortium called the Alliance of Genome Resources (https://www.alliancegenome.org/) which coordinates efforts across other biological knowledge bases for fly, mouse year, rat and zebrafish. So, if the WormBase prototype using scVI to integrate and visualize worm data across papers catches on, it could end up being replicated by the other knowledge bases. Pretty cool huh?

romain-lopez commented 4 years ago

This looks good to me Eduardo! All of this is pretty exciting! This could also be great for a empirical way of getting ground truth for DE somehow (or a proxy). Let us know if you have any questions or criticism about the new DE (especially @PierreBoyeau !). Ping me when you want me to give stylistic / usage feedback and I will annotate the notebook.

Munfred commented 4 years ago

Awesome! I'm at a meeting this week (and I think you guys are at NeurIPS?) and travelling next week, so no hurries. I'll probably ping you closer to Christmas

Munfred commented 4 years ago

@romain-lopez et al: merry christmas!

If you happen to get a chance the review the notebook before new year that will be great. Thanks!

romain-lopez commented 4 years ago

Hi, would you please add a mock PR with the ipynb file ? It's easier for us to review with notebook reviewing tools than on colab.

Munfred commented 4 years ago

Ok I just made one. I put the notebook under https://github.com/Munfred/scVI/tree/master/tests/notebooks

Thanks!

romain-lopez commented 4 years ago

Hi @Munfred, I gave some feedback that I hope will be useful to increase the readibility of the notebook.

Also, there are some code style "errors" with your notebook that might be nice to address if you would like. Something like this would do https://stackoverflow.com/questions/26126853/verifying-pep8-in-ipython-notebook-code, especially the option %flake8_on supposedly

romain-lopez commented 4 years ago

Overall, this is a long notebook so we need to make sure we get the user engaged!

romain-lopez commented 4 years ago

I looked at the plots and they look pretty cool!

romain-lopez commented 4 years ago

This is really cool work! When we decide how to put it into the repo, please do not forget to push changes to the history.rst file with your contribution (we are trying to keep track of all that).

romain-lopez commented 4 years ago

Hi @Munfred, did you have time to think about my comments? Happy to include this in the coming scVI version!

Munfred commented 4 years ago

Hey sorry I got distracted with other things and didn't rework the notebook - I can get this done on the next day or two. Some questions:

romain-lopez commented 4 years ago

Hi,

We are not going to maintain this code as part of the codebase but pin it to a particular version of scVI so that your notebook demonstrates supplementary options for users.

Therefore, I advise to use plotly in case people wouldn't know of it existence.

I would advise using only one DE method for clarity. We can stick to the current DE code and update that part of the notebook in a bit.

I liked the fact that this is analyzing a cool and new dataset by comparison of the codebase which justs fetch old and known datasets. Is there any reason to not use the worm one ?

romain-lopez commented 4 years ago

I was just advising that we could add it to the codebase but even, not doing that might help some users fetching their data into scVI ?

romain-lopez commented 4 years ago

Hi @Munfred, we are going to publish a new version soon and it would be great to include your notebook! Would you still like to have it on the repo ?

Munfred commented 4 years ago

Hey! Sorry for the delay - I finished the revised version of the notebooks.

Following your suggestions I trimmed down on the embeddings and simplified some of the output. I also re-worked the session discussing Bayes factors a bit, after explaining it to a few people in the lab. Typically people define BF = p(H_0 | data)/p(H_1 | data), but scVI actually returns the natural log of that, ln(BF), so made sure to highlight that. I also included a table with typical interpretations for different values of BF, which I think many people will find helpful.

I put the v3 of the Jupyter notebook here: https://github.com/Munfred/worm-notebooks/blob/master/scVI_DE_worm_v3.ipynb

An html version with the interactive plots working is here (you need to download it and render on the browser, GitHub won't render it): https://github.com/Munfred/worm-notebooks/blob/master/scVI_DE_worm_v3.html

If you think the changes are good, I can make a pull request with the notebook.

On a separate note - I also finished the first version of an online tool for performing differential expression with scVI on all of the publicly available C. elegans data (which are only 3 studies with 21 experiments). It does the same thing the notebook does, but is easy for anyone to query, and jobs run in a few minutes: http://de.wormcells.com/

Thanks!

Eduardo

romain-lopez commented 4 years ago

Hi @Munfred,

Thanks for this notebook and congrats on the interactive tool, I think this is all very interesting. There are several ways of integrating this to the codebase.

We could get this notebook unit tested but it would mean adding more dependencies to package, which is something we try to avoid (it can quickly add complications at maintenance).

One option I think is more viable is to (1) proofread the notebook and make sure it runs on scVI 0.6.0 (2) make it Colab runnable and pinned down to scVI 0.6.0 (3) push this notebook on the codebase and (4) make sure it appears on the documentation for visibility purposes

What do you think @Munfred ? Let's plan to release this as part of 0.6.1 and advertise it accordingly :)

Munfred commented 4 years ago

Sounds good, I'll review the notebook for v0.6, and the change mode, and have at least one other person proofread and try to run it. Classes are over this week so I should be able to have it done by the weekend at the latest

Munfred commented 4 years ago

Ok I finished it and had a friend review it. Here's the GitHub link: https://github.com/Munfred/worm-notebooks/blob/master/scVI_DE_worm_v5.ipynb

And colab link: https://colab.research.google.com/github/Munfred/worm-notebooks/blob/master/scVI_DE_worm_v5.ipynb

romain-lopez commented 4 years ago

This is great thanks! I have a question for @adamgayoso and @PierreBoyeau. Do you know why there are so many warnings about posterior indices in this notebook ?

[2020-03-12 21:56:41,485] WARNING - scvi.inference.posterior | Posterior indices were modified at some point. Please ensure that provided indices correspond to the current posterior indices.

while the posterior was created using this code, which seems standard ?

full = trainer.create_posterior(trainer.model, gene_dataset, indices=np.arange(len(gene_dataset)))

Once this is solved, @PierreBoyeau would you have time to integrate this to the codebase (no unit testing, but adding it to the docs would be great).

adamgayoso commented 4 years ago

Thanks for this @Munfred. I'll figure out this issue with @PierreBoyeau. I think in a week or two we will release 0.6.2 -- it would be great if we could rerun this notebook with respect to that exact version and then we can place it in a "Contributed tutorials" section of read the docs. By rerun I mean so that the outputs are viewable on read the docs (e.g., https://scvi.readthedocs.io/en/stable/tutorials/basic_tutorial.html)

Edit: though I'm not sure how the interactive stuff would appear here.

Munfred commented 4 years ago

I reran it by installing from master instead of stable as below but it still says version 0.6.1 and not 0.6.2. Should I have used another branch?

!pip install --quiet git+https://github.com/yoseflab/scvi@master#egg=scvi[notebooks]

It should now display all the outputs correctly: https://github.com/Munfred/worm-notebooks/blob/master/scVI_DE_worm_v5.ipynb https://colab.research.google.com/github/Munfred/worm-notebooks/blob/master/scVI_DE_worm_v5.ipynb

adamgayoso commented 4 years ago

So I think what we can do is release 0.6.2 (in a few days), then you can change the pip install to install that exact version, and then I can add the notebook the master (and stable, where readthedocs defaults) branches. By the way, I saw some duplicate code in the notebook that you may want to remove. Screen Shot 2020-03-31 at 8 01 16 PM

I'm also not sure we want to say "# We select the 1000 most variable genes, which is the default selection criteria of scvi". Perhaps we can say recommended selection criteria? @romain-lopez ?

We also have a new feature of saving and loading the posterior object (in the basic tutorial now), which also saves the model, and dataset. I'm not sure if you'd like to refactor the notebook to include this new functionality, but it should be fine either way.

Thanks so much for doing this!

romain-lopez commented 4 years ago

We select the 1000 most variable genes, which is the default selection criteria of scvi". Perhaps we can say recommended selection criteria? @romain-lopez ?

recommended seems good, we updated the default recently so might as well be accurate. We can even say that it's a choice of the notebook but that there are other possiblities in scVI.

adamgayoso commented 4 years ago

I released 0.6.2, I'm running this notebook now in Colab and then will make a PR to put it in the docs, which you can then review @Munfred. I made a few changes to the notebook based on our latest bug fixes