scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
220 stars 34 forks source link

BCR tutorial #542

Closed MKanetscheider closed 2 weeks ago

MKanetscheider commented 2 months ago

Added beta-version v2 of bcr tutorial and adapted corresponding file so that I (hopefully) can visualize it with read-the-docs. I have drastically reduced the tutorial as I was very unsatisfied with the previous version. I will add soon further literature to the .bib file and adapt the glossary to make the tutorial more precise and less overwhelming, while still providing any interested user with additional information.

I would be happy for any feedback (@FFinotello @grst) to make the tutorial as good as it could possibly be!

Closes #199

review-notebook-app[bot] commented 2 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

MKanetscheider commented 2 months ago

Hi, could you help me out, please? Why is here the readthedocs build failing... I don't really get the issue as there are only warnings, but no further details :/

grst commented 2 months ago

Warnings are treated as errors.


/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:40002: WARNING: could not find bibtex key "null.2022"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:40005: WARNING: could not find bibtex key "Suo.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:60003: WARNING: could not find bibtex key "Lefranc.2003"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:60005: WARNING: could not find bibtex key "Suo.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:120003: WARNING: could not find bibtex key "Zhu.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:120014: WARNING: could not find bibtex key "Shi.2019"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170022: WARNING: term not in glossary: 'SHM'
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170024: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170026: WARNING: could not find bibtex key "Gupta.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170026: WARNING: could not find bibtex key "Kepler.2014"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170028: WARNING: could not find bibtex key "Gupta.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:170028: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190002: WARNING: could not find bibtex key "Yaari.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190002: WARNING: could not find bibtex key "DeKosky.2013"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:190008: WARNING: could not find bibtex key "Clauset.2004"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:260004: WARNING: could not find bibtex key "Adams.2020"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:280002: WARNING: could not find bibtex key "Nutt.2015"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:320004: WARNING: could not find bibtex key "Finotello.2016"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:320004: WARNING: could not find bibtex key "Pelissier.2023"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:360002: WARNING: py:func reference target not found: scirpy.tl.hill_diversity_profile
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:380002: WARNING: could not find bibtex key "Chao.2014"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:400004: WARNING: py:func reference target not found: scirpy.tl.convert_hill_table
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:400004: WARNING: py:func reference target not found: scirpy.tl.hill_diversity_profile
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:420002: WARNING: could not find bibtex key "Jost.2010"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:530003: WARNING: could not find bibtex key "Kenneth.2017"
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600003: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600003: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:600006: WARNING: py:func reference target not found: scirpy.pl.logoplot_cdr3_motif
/home/docs/checkouts/readthedocs.org/user_builds/scirpy/checkouts/542/docs/tutorials/tutorial_5k_bcr.ipynb:640005: WARNING: py:func reference target not found: scirpy.tl.mutational_load

this means you are referring to citation keys and functions that don't exist.

MKanetscheider commented 2 months ago

Thanks a lot that makes sense...I will add the other citations and will for now exclude those references new functions as they are still in their own PR, but used in the notebook... 🥹

MKanetscheider commented 2 months ago

If the read the Docs build is succesfull we are able to investigate the tutorial on the website interface, right?

grst commented 2 months ago

If the read the Docs build is succesfull we are able to investigate the tutorial on the website interface, right?

yes

MKanetscheider commented 2 months ago

Hi, I adapted also the glossary a little bit to include some more information regarding B cells and B cell clustering, which is in my opinion important to know/clarify, but does confuse if included into the markdown text of the tutorial. I would have some questions that might need some discussion:

My idea here would be to adapt the clonotype cluster function so that it automatically ignores multiple v_call's/j_call's i.e. only considers the first one and also ignores the allele information for clustering, but doesn't manipulate the call itself. Immcantation has a own parameter on how to work with multiple calls for a gene (see "parameter first= FALSE": https://scoper.readthedocs.io/en/stable/topics/hierarchicalClones/). Actually I encountered this problem already some time ago and discussed it with @felixpetschko but eventually we both forgot about it until now. Either way, I think it's good if @grst can also have a look on this problem and help with a solution, because if I remeber correctly it's not that trivial to "fix" this. Maybe there is some elegant workaround available?

grst commented 1 month ago

is it possible to include the .h5mu file that I used to load the 5k B-cells for the tutorial somewhere into github? It is a rather large file (~2 600 000KB) so directly importing it into GitHub shouldn't work as far as I'm aware. Is there an alternative solution, because I think it's important that any user can experiment a bit with this toy dataset. Is there a way to implement the test data similar to the one you used for the TCR tutorial, i.e. load it with its own function call? If this is desired I would be happy to give it a try, but maybe you need to offer me some guidance as I'm not sure how "easy" this is for me :/

If you can get the size below 2GB (e.g. by changing the compression to gzip when saving the h5mu file), we can attach it to a scirpy release on GitHub. Otherwise it's possible to upload it to figshare.com or maybe huggingface.co. Such a dataset should definitely be available from scirpy.datasets. It should be easy to add, just take a look at the other functions that are already there.

grst commented 1 month ago
grst commented 1 month ago
grst commented 1 month ago

Just dropping comments here as I go through the notebook...

grst commented 1 month ago
MKanetscheider commented 1 month ago

Just dropping comments here as I go through the notebook...

Section Define clonotype clusters: I don't really see the bimodality in the plots. Is this just an issue with this dataset, or may there be a problem with our implementation? If the former, could you please come up with 1-2 sentences discussion why this pattern is not visible in all cases? And maybe link to an example where it works well...

Actually I think our implementation is fine as this "bimodality" seems to be just somewhat resemble a bimodality like you can see here in the shazam tutorial (https://shazam.readthedocs.io/en/latest/vignettes/DistToNearest-Vignette/). I think that's also the reason why they came up with a computational model to select an appropriate threshold as it's usually not very clear just from the plot. I just wrote a short discussion that this can occur and that in such cases a fixed threshold might reduce human bias...I know this is not ideal, but as we don't have a way to automatically define bimodalities this should be sufficient for now

MKanetscheider commented 1 month ago

Regarding preprocessing, did you also check out if nf-core/airrflow is an option for re-annoation? That could also be a pretty smooth workflow to run a nextflow pipeline first (it also does some standard analyses) and then follow up with scirpy for more custom analyses.

Yes I did. It should be usable as a re-annotation tool as it works with single-cell-data derived from Cellranger and it does output a .tsv file, which follows the AIRR community standards. Do you want to integrate this somehow into the tutorial?

grst commented 1 month ago

For now, I removed a few sections that depend on other open PRs (#536, #534, #535) and copied the content over to those PRs. I believe like that we can wrap up this PR faster and discuss the other sections in a more focused manner.

Yes I did. It should be usable as a re-annotation tool as it works with single-cell-data derived from Cellranger and it does output a .tsv file, which follows the AIRR community standards. Do you want to integrate this somehow into the tutorial?

I think it might be even easier to use than dandelion for preprocessing. If you think it gives equally good results I think we should mention it as another option to do preprocessing in the corresponding section.

MKanetscheider commented 1 month ago

For now, I removed a few sections that depend on other open PRs (#536, #534, #535) and copied the content over to those PRs. I believe like that we can wrap up this PR faster and discuss the other sections in a more focused manner.

Yes, you are definitely right. In some manner this tutorial is almost finished, but it depends of course if and how much we are changing in the remaining PRs. So it makes sense to wrap this one up and add sections as part of the other PRs.

I think it might be even easier to use than dandelion for preprocessing. If you think it gives equally good results I think we should mention it as another option to do preprocessing in the corresponding section.

If you wish, I will add a reference in an appropriate place so that the user is aware of this possibility 👍 The interesting thing is that Dandelion also relies a lot on Immcantation so the re-annotation pipeline is essentially the same. The only difference I can see is that with dandelion one has the possibility to change from a dandelion object to AnnData/MuData quite easily, while in the nf-core workflow one has to write and read an appropriate file first. Either way I don't feel like that should be a big obstacle. 😄

grst commented 2 weeks ago

I went through the remaining bits and also added a reference to AIRRflow. Thanks for your patience and persistence while working on this!

We'll follow up on the missing pieces in #536, #535 and #534

I'll merge this as soon as the tests ran through.