jspaezp / sctree

Tree based marker finding and gating visualization for single cell rna seq data
Apache License 2.0
4 stars 1 forks source link

[joss review] Documentation: Example usage #5

Closed jenzopr closed 4 years ago

jenzopr commented 4 years ago

This issue is part of the JOSS review.

Summary

The example usage sections (entitled with Usage and Seurat Interface in the README.md file) contain detailed code examples for getting started with the package and are fine-grained enough to easily follow along. I was able carve out the following basic analysis workflow:

  1. Find marker genes
  2. (optional) Visualize outcome of expected flow cytometry experiment
  3. Get and validate a suggested gating strategy
  4. Visualize expected flow cytometry experiment and gating
  5. Find antibodies

However, the presented order of functions complicate understanding of the intended analysis workflow and needs improvement (see major and minor issues below).

Major issues

Minor issues encountered during testing

jspaezp commented 4 years ago

Hello @jenzopr

Looking at the major issues:

Three functions are presented to find marker genes, namely ranger_importances.Seurat, 
findAllMarkers_ranger and FindAllMarkers (together with its complementary function FindMarkers). 
What are the differences between those and when should I use which? Should or can those 
functions be used complimentary?

I could specify in the documentation that FindAllMarkers and FindMarkers are the high-level functions that most users would be interested in and ranger_importances.Seurat and findAllMarkers_ranger are internals that are usually called by these high-level functions. I feel like keeping these functions separate is essential to maintain consistency with the equivalent functions in Seurat but the addition of details in the documentation and vignettes would be useful.

Having obtained a suggested gating strategy with fit_ctree, how can quality of the fitted tree be 
assessed? Which measurement(s) can be used to evaluate the goodness of the tree?

Since the object returned has a predict method, It could be used to generate a prediction based on a testing/training split on the data. A vignette explaining how to carry that out would be really useful for the end-user.

Looking at the minor issues:

  1. I noticed that calling sctree::FindMarkers fails only when called before loading the pacakge, which might have to do with how calls its internals. I feel like the most straightforward way to fix it would be to test if WhichCells is in the search path and throw a more informative error saying that it needs to be loaded.
sctree::FindMarkers(
  object = Seurat::pbmc_small,
  ident.1 = 0,
  ident.2 = 1,
  test.use = "RangerDE",
  verbose = FALSE)
# Error in WhichCells(object = object, idents = ident.1) :
#   could not find function "WhichCells"
# Error in inherits(ok, "try-error") : object 'ok' not found

require(sctree)

sctree::FindMarkers(
  object = Seurat::pbmc_small,
  ident.1 = 0,
  ident.2 = 1,
  test.use = "RangerDE", 
  verbose = FALSE)
#                  importance      p_val          gene  avg_logFC pct.1
# GNLY           0.0530090754 0.00000000          GNLY  5.1338001 0.444
# CD7            0.0961302279 0.00000000           CD7  4.5355550 0.528
# PRF1           0.0436675892 0.00000000          PRF1  4.0809632 0.417
#................
#................
#................
# IL17RA         0.20         1
# SNHG7          0.12         1
# RBP7           0.28         1
  1. Regarding the function throwing an error when MAST is not installed, that is due to how the function handles the missing package in Seurat, nonetheless, it would be easy enough to just add specific error handling for it. Nonetheless, the error propagates well enough to give an informative error message (will add the exception).

  2. Regarding the output of sctree::plot_flowstyle, I agree with it being a good idea, will definitely implement that. I guess that handling it in a similar way that plot_gates does it would be good.

TODO

mschubert commented 4 years ago

ranger_importances.Seurat and findAllMarkers_ranger are internals that are usually called by these high-level functions

If they are internals, are you sure you want to @export them?

You can also mark them with @keywords internal so they don't show up in the API documentation index and are less prone to confuse users.

jspaezp commented 4 years ago

Update:

  1. Cluster highlighting implemented in 4ad31e0474f46ce34b2525fa2e1b76758c652a51
  2. Fixed issue where MAST (and deseq2) would not be flagged correctly as missing installation 0cbc8f4dec192947ec04da968eada1c350de41c0
  3. Added better handling of WhichCells, having it be correctly imported from Seurat 90d0d517cdb039657cce87c766a726abc7cc827a

@natallah

jenzopr commented 4 years ago

Nice work! 🎉

jspaezp commented 4 years ago

Hello @jenzopr and @mschubert

Commit b402fbe9eaaa17c352053f65c953b173c4f0e5f5 further includes changes on this regard with the clarification in the documentation on what functions are intended for end-user and which are internals, indeed it was a good idea to have @keywords internal added.

@natallah

jspaezp commented 4 years ago

@jenzopr @natallah @mschubert

Aadded train/test vignette in commit ffa30082ed60cc7dc0d9bdd26197ebaedbaa4bfe

This completes the checklist for this issue, let me know if you believe we can close this issue. Kindest wishes, Sebastian

jenzopr commented 4 years ago

Very nice! I really like the Evaluating Tree Quality vignette!

jenzopr commented 4 years ago

A last small request from my side: Can you update the introduction of the README.md file and include a bit more from the papers Summary to fulfill the statement of need from the JOSS checklist: Do the authors clearly state what problems the software is designed to solve and who the target audience is? You basically have this nicely in the paper, but not yet in the Readme and webpage. Thanks!

natallah commented 4 years ago

@jspaezp @jenzopr

I updated the README.md accordingly! Great suggestion - thanks!