[REVIEW]: gesel: a JavaScript package for client-side gene set enrichment

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@LTLA@arfon<!--end-editor-- Reviewers: @majensen, @bede Archive: 10.5281/zenodo.10032294

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/b7f3cc8c95c37daad28490c0e7ca7400"><img src="https://joss.theoj.org/papers/b7f3cc8c95c37daad28490c0e7ca7400/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/b7f3cc8c95c37daad28490c0e7ca7400/status.svg)](https://joss.theoj.org/papers/b7f3cc8c95c37daad28490c0e7ca7400)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@majensen & @bede, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @majensen

📝 Checklist for @bede

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.02 s (1933.7 files/s, 134711.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JavaScript                      31            279            372           1349
Markdown                         2             75              0            323
TeX                              1             13              0            129
YAML                             3             20              1             93
JSON                             2              0              0             63
-------------------------------------------------------------------------------
SUM:                            39            387            373           1957
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1186/1471-2105-14-128 is OK
- 10.1093/bioinformatics/btr260 is OK
- 10.1038/75556 is OK
- 10.1101/2022.03.02.482701 is OK
- 10.1093/bioinformatics/btaa591 is OK
- 10.1073/pnas.0506580102 is OK
- 10.1093/nar/gks461 is OK
- 10.1186/gb-2010-11-2-r14 is OK
- 10.1101/060012 is OK
- 10.1093/nar/gkm323 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

Wordcount for paper.md is 2541

arfon commented 1 year ago

@majensen, @bede – This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above. Please create your checklist typing:

@editorialbot generate my checklist

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5777 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

LTLA commented 1 year ago

Thanks to all for getting involved in this. We've added some more functionality to gesel in the meantime (including a fun little visualization of all gene sets on a t-SNE), but we'll give everyone a chance to read the current manuscript first.

arfon commented 1 year ago

Friendly reminder @majensen & @bede to get started on your reviews soon please.

majensen commented 1 year ago

@editorialbot generate pdf

majensen commented 1 year ago

@editorialbot create checklist

majensen commented 1 year ago

@editorialbot commands

editorialbot commented 1 year ago

Hello @majensen, here are the things you can ask me to do:


# List all available commands
@editorialbot commands

# Add to this issue's reviewers list
@editorialbot add @username as reviewer

# Remove from this issue's reviewers list
@editorialbot remove @username from reviewers

# Get a list of all editors's GitHub handles
@editorialbot list editors

# Assign a user as the editor of this submission
@editorialbot assign @username as editor

# Remove the editor assigned to this submission
@editorialbot remove editor

# Remind an author, a reviewer or the editor to return to a review after a 
# certain period of time (supported units days and weeks)
@editorialbot remind @reviewer in 2 weeks

# Check the references of the paper for missing DOIs
@editorialbot check references

# Perform checks on the repository
@editorialbot check repository

# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist

# Set a value for version
@editorialbot set v1.0.0 as version

# Set a value for branch
@editorialbot set joss-paper as branch

# Set a value for repository
@editorialbot set https://github.com/organization/repo as repository

# Set a value for the archive DOI
@editorialbot set set 10.5281/zenodo.6861996 as archive

# Mention the EiCs for the correct track
@editorialbot ping track-eic

# Generates the pdf paper
@editorialbot generate pdf

# Recommends the submission for acceptance
@editorialbot recommend-accept

# Generates a LaTeX preprint file
@editorialbot generate preprint

# Flag submission with questionable scope
@editorialbot query scope

# Get a link to the complete list of reviewers
@editorialbot list reviewers

# Creates a post-review checklist with editor and authors tasks
@editorialbot create post-review checklist

# Open the review issue
@editorialbot start review

majensen commented 1 year ago

Review checklist for @majensen

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/LTLA/gesel.js?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@LTLA) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

majensen commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1186/1471-2105-14-128 is OK
- 10.1093/bioinformatics/btr260 is OK
- 10.1038/75556 is OK
- 10.1101/2022.03.02.482701 is OK
- 10.1093/bioinformatics/btaa591 is OK
- 10.1073/pnas.0506580102 is OK
- 10.1093/nar/gks461 is OK
- 10.1186/gb-2010-11-2-r14 is OK
- 10.1101/060012 is OK
- 10.1093/nar/gkm323 is OK

MISSING DOIs

- None

INVALID DOIs

- None

bede commented 1 year ago

Review checklist for @bede

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/LTLA/gesel.js?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@LTLA) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

bede commented 1 year ago

Hi @LTLA, I didn't find any community guidelines. Would you either point me to them or add them? Thank you.

LTLA commented 1 year ago

Hi @bede, done: https://github.com/LTLA/gesel.js/blob/master/CONTRIBUTING.md

bede commented 1 year ago

Excellent, thank you!

majensen commented 1 year ago

@LTLA - looks like the rendered paper is missing the Subramaniam 2005 ref and the Lun & Manchera 2023 (or is it 2022?). Can you have a look?

majensen commented 1 year ago

(Is that ref this guy: https://pubmed.ncbi.nlm.nih.gov/15980550/ ?)

LTLA commented 1 year ago

@majensen not sure what you mean. The latest rendering has both references on the last page:

Screenshot from 2023-09-11 02-42-32

I took the liberty of updating Lun and Kancherla reference, given that #5603 has been accepted (yay).

LTLA commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

majensen commented 1 year ago

Thanks @LTLA - that's bizarre that I somehow missed the last page. A senior moment I guess.

arfon commented 1 year ago

:wave: folks. How are we getting on here? @majensen, @bede – it looks like you're part way through both of your reviews? Are you waiting on @LTLA or myself for anything right now?

bede commented 1 year ago

Apologies for delay, will finish this weekend

On Fri, 29 Sept 2023 at 17:04, Arfon Smith @.***> wrote:

👋 folks. How are we getting on here? @majensen https://github.com/majensen, @bede https://github.com/bede – it looks like you're part way through both of your reviews? Are you waiting on @LTLA https://github.com/LTLA or myself for anything right now?

— Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/5777#issuecomment-1741127798, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHWAADAAM2LDD6FBFTEPR3X43WSXANCNFSM6AAAAAA3ZMDFG4 . You are receiving this because you were mentioned.Message ID: @.***>

bede commented 1 year ago

Review complete, thanks all!

majensen commented 1 year ago

@arfon Still working on it, but should be complete this week. @LTLA - one request - could you write a few sentences into the paper that describes the companion repo (feedstock) in some more detail? I think it would be helpful to users.

LTLA commented 1 year ago

@majensen Done, added a few more sentences to the final paragraph:

These are simple tab-separated text files containing information about the genes, sets, collections, and the mappings between them. We store the byte ranges for each relationship in the mapping files to enable on-demand range requests. To reduce data transfer, we apply some standard practices like delta-encoding the sorted gene identifiers and Gzip-compressing the byte range files.

LTLA commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

majensen commented 1 year ago

@LTLA so sorry about my slowness. Maybe you can help speed me up - I've run the code itself (rather than just hitting the site), and it seems to work as advertised. What I want to do is compare its lists with output from e.g. MSigDB. My problem is my lack of experience in the area and the many options for GSEA out there. For example, is the order of genes returned important? (My guess is that has some randomness associated with it, so maybe not.) If some genes are missing in one and present in the other list, is this a problem? Anyway, this is the question I'm trying to formulate. Thoughts?

LTLA commented 1 year ago

No worries.

For example, is the order of genes returned important? (My guess is that has some randomness associated with it, so maybe not.)

For the simple hypergeometric test that we use: no, the order doesn't matter. This test only cares about the existence of genes in the set.

There are certainly other tests that care about the ordering of genes, e.g., the original "GSEA" test (as in, the actual test by that name, not the general concept), limma::geneSetTest, and so on. However, the relevant order is not that of genes in the pre-defined set, but instead, of the genes that are supplied by the user - typically in order of significance from a differential expression analysis. This allows the test to use more information about the relative importance of genes when considering their enrichment in the set, as opposed to our binary yes/no approach.

Having said that, it is difficult to ask an average user to provide an ordering of genes (at least, if they didn't already do a DE analysis). Hence we use a relatively simple test.

If some genes are missing in one and present in the other list, is this a problem?

Depends on what lists you're talking about.

I'll consider the most obvious use case where one of the lists is the user-supplied list of genes and the other is a specific gene set. If a gene is missing from one or the other, that's fine and to be expected. The hypergeometric p-value quantifies the significance of the overlap between lists, so if there are lot of genes present in one and not the other, we can expect a low overlap and a large p-value. This will cause the gene set to be lowly ranked in the table of search results.

The less obvious interpretation of your question relates to the "universe" of all possible genes. The previous paragraph assumed that, if we saw any genes in the user's list that were missing from our gene sets, those genes were still valid identifiers that existed somewhere in our annotation for the species of interest. However, if the user's list contains a gene that we don't know about (e.g., transgenes, novel genes), we just ignore that gene as there's not much we can do.

An additional wart to consider is that gesel's universe may not be the same as the user's universe, e.g., if the user used a different reference genome. This can interfere with the calculation of the gesel p-values, though they should still be acceptable for ranking gene sets in the search table. (GSEA-related p-values should be taken with a grain of salt anyway, given that they make an often-unreasonable assumption of independence between genes under the null.)

majensen commented 11 months ago

Thanks @LTLA - this all makes sense to me. The prosecution rests - @arfon I'm good to go. I really like this application and I hope people use it!

arfon commented 11 months ago

@LTLA – looks like we're very close to being done here. I will circle back here next week, but in the meantime, please give your own paper a final read to check for any potential typos etc.

After that, could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

The title of the archive is the same as the JOSS paper title
That the authors of the archive are the same as the JOSS paper authors
I can then move forward with accepting the submission.

LTLA commented 11 months ago

@editorialbot generate pdf

editorialbot commented 11 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

LTLA commented 11 months ago

Thanks @arfon, see 10.5281/zenodo.10032294.

arfon commented 11 months ago

@editorialbot set 10.5281/zenodo.10032294 as archive

editorialbot commented 11 months ago

Done! archive is now 10.5281/zenodo.10032294

arfon commented 11 months ago

@editorialbot recommend-accept

editorialbot commented 11 months ago

Attempting dry run of processing paper acceptance...

editorialbot commented 11 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1186/1471-2105-14-128 is OK
- 10.1093/bioinformatics/btr260 is OK
- 10.1038/75556 is OK
- 10.21105/joss.05603 is OK
- 10.1093/bioinformatics/btaa591 is OK
- 10.1073/pnas.0506580102 is OK
- 10.1093/nar/gks461 is OK
- 10.1186/gb-2010-11-2-r14 is OK
- 10.1101/060012 is OK
- 10.1093/nar/gkm323 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 11 months ago

:wave: @openjournals/bcm-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/4714, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

Kevin-Mattheus-Moerman commented 11 months ago

@editorialbot set 0.3.3 as version

editorialbot commented 11 months ago

Done! version is now 0.3.3

Kevin-Mattheus-Moerman commented 11 months ago

@LTLA I am the EiC in this track and here to help with final steps. I have checked this review, your repository, the paper, and the archive link. Most seems in order, I only have the below points which require your attention:

[x] In your affiliations, please spell out USA as United States of America
[x] The reference for Visualizing data using t-SNE does not have a DOI. I also was not able to find one manually so leaving out a DOI is fine. However I did find this link: https://jmlr.org/papers/v9/vandermaaten08a.html, which shows that the number field in the bib file entry is wrong (currently 11, should be 86), and that the page numbers are missing. Furthermore, perhaps the URL link can be added. In short, could you consider updating your bib file for this reference to have the following:

@article{van2008visualizing,
  author  = {Laurens van der Maaten and Geoffrey Hinton},
  title   = {Visualizing Data using t-SNE},
  journal = {Journal of Machine Learning Research},
  year    = {2008},
  volume  = {9},
  number  = {86},
  pages   = {2579--2605},
  url     = {http://jmlr.org/papers/v9/vandermaaten08a.html}
}

LTLA commented 11 months ago

@editorialbot generate pdf

openjournals / joss-reviews