OpenOmics: Library for integration of multi-omics, annotation, and interaction data

Submitting Author: Jonny Tran (@JonnyTran)
All current maintainers: @JonnyTran Package Name: openomics One-Line Description of Package: Library for integration of multi-omics, annotation, and interaction data Repository Link: https://github.com/JonnyTran/OpenOmics Version submitted: 0.8.4 Editor: @NickleDave
Reviewer 1: @gawbul Reviewer 2: @ksielemann Archive: JOSS DOI: Version accepted: v 0.8.8 Date accepted (month/day/year): 04/17/2021

Description

OpenOmics is a Python library to assist integration of heterogeneous multi-omics bioinformatics data. By providing an API of data manipulation tools as well as a web interface (WIP), OpenOmics facilitates the common coding tasks when preparing data for bioinformatics analysis. It features support for:

Genomics, Transcriptomics, Proteomics, and Clinical data.
Harmonization with 20+ popular annotation, interaction, and disease-association databases (e.g. GENCODE, Ensembl, RNA Central, BioGRID, DisGeNet etc.)

OpenOmics also has an efficient data pipeline that bridges the popular data manipulation Pandas library and Dask distributed processing to address the following use cases:

Provides a standard pipeline for dataset indexing, table joining and querying, which are transparent and customizable for end-users.
Efficient disk storage for large multi-omics dataset with Parquet data structures.
Multiple data types that supports both interactions and sequence data, and allows users to export to NetworkX graphs or down-stream machine learning.
An easy-to-use API that works seamlessly with external Galaxy tool interface or the built-in Dash web interface (WIP).

Scope

Please indicate which category or categories this package falls under:
- [x] Data retrieval
- [x] Data extraction
- [x] Data munging
- [ ] Data deposition
- [x] Reproducibility
- [ ] Geospatial
- [ ] Education
- [ ] Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

OpenOmics' core functionalities are to provide a suite of tools for data preprocessing, data integration, and public database retrieval. Its main goal is to maximize the transparency and reproducibility in the process of multi-omics data integration.

Who is the target audience and what are scientific applications of this package?

OpenOmics' primary target audience are computational bioinformaticians, and the scientific application of this package is to provide scalable ad-hoc data-frame manipulation for multi-omics data integration in a reproducible manner. Also, we are currently developing an interactive web dashboard and interfaces to the Galaxy Tool Shed, disseminating the tool to biologists without a programming background.

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

Existing PyPI Python packages within the scope of multi-omics data analysis are "pythomics" and "omics". Their functions appear to be lacking support for manipulation of integrated multi-omics dataset, retrieval of public databases, and extensible OOP design. OpenOmics aims to follow modern software best-practices and package publishing standards.

Aside from multi-omics integration tools, several specialized Python packages exists for single omics data, such as ScanPy's "AnnData" and "Loom" files. They provide an intuitive data structure for expression arrays and side annotations, and Loom file even allows for out-of-core data-frame processing. However, they don't yet provide mechanisms for multi-omics data integration, where each omics data may have overlapping samples or varying row/column sizes.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

https://github.com/pyOpenSci/software-review/issues/30

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has an OSI approved license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

[x] Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

- [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [x] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [x] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.4441167 *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

[x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

[x] I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

welcome to pyopensci @JonnyTran someone will followup with you in the next week or so! Our progress is slow right now as we're working o n funding for this effort!! @NickleDave was this one you wanted to work on? you are doing a lot so please let me know if you have time (or if you don't that is understandable too!)

@JonnyTran this looks perfect, thank you. Sorry for not replying when you opened this issue.

Hey @lwasser yes I have this on my to-do list and will start looking for reviewers Wednesday

wonderful! @NickleDave thank you!

Hi all, just adding editor checks.

Thank you @JonnyTran for your detailed submission

Editor checks:

[x] Fit: The package meets criteria for fit and overlap.
- yes, tool for working with different types of -omics data
[x] Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
- yes, Travis CI
[x] License: The package has an OSI accepted license
- yes, MIT
[x] Repository: The repository link resolves correctly
- yup

submitter did not check "yes" for submitting to JOSS -- edit 2021-01-22: reviewer will submit to JOSS. Has added DOI to repository in response to my initial comments

[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Overall:

perfect fit for PyOpenSci, definitely looks like it has the potential to be an incredibly useful data retrieval / extraction / munging tool that faciliates reproducible pipelines
- many use cases e.g. buildng datasets for machine learning from multiple omics sources

Minor comments: things I notice on first glance

no DOI for releases, e.g. using Zenodo integration. Good idea to have DOI to make releases citable, e.g. in papers to make explicit which version was used
- more info on Zenodo integration with GitHub here: https://guides.github.com/activities/citable-code/
no explicit link to docs page
- this could be in README and / or in the "about" section -- GitHub lets you add a link there in addition to the text description
- I did finally find the docs by clicking on the badge ... but if it were me I would have a big giant link point to the docs as obvious as possible
some info could be added to README

This is definitely an "elements of style" question / matter of personal preference, and not a thing that is directly addressed by our developer guide, but:
I really like the readthedocs suggestions for what to include in your README:
https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/#readme I don't see a lot of those things included in yours now
Don't get me wrong, It is definitely great that you have many concrete examples built around code snippets
but there's a trade-off involved in having many of them at the expense of having all the other information, e.g. more detailed install instructions, info on where/how to raise issues, license type, etc.
Maybe have one good code snippet + then link to the other examples via a link to the docs?
You don't want to miss getting all that other info to potential contributors / other developers who are kind of most likely to look at your GitHub README, IMHO
but for sure many successful projects that don't follow this pattern (see for example the tqdm README)

Reviewers: @gawbul @ksielemann Due date: February 15, 2021

started reaching out to reviewers, will update as soon as we hear back

Minor comments:

no DOI for releases, e.g. using Zenodo integration. Good idea to have DOI to make releases citable, e.g. in papers to make explicit which version was used

no explicit link to docs page

Thanks so much for the great suggestions, @NickleDave. I've addressed the "releases DOI" and "docs link" issues above, and I'm planning on updating the README file to be more attractive to potential contributors.

Hi @NickleDave and @lwasser, will it be possible for OpenOmics to make submission to JOSS at this stage of the review? I just changed my mind about this, but I already have a complete manuscript and I'm ready to provide the paper.md within a few days.

Hey @JonnyTran I think that could be okay. But first let me make sure I understand what you're asking.

Hi @NickleDave and @lwasser, will it be possible for OpenOmics to make submission to JOSS at this stage of the review?

Do you mean that you want to change "Publication Options" in your submission above, and check the box for "automatically submit to" JOSS? If so, yes I think that's fine. @lwasser please confirm

I just changed my mind about this, but I already have a complete manuscript and I'm ready to provide the paper.md within a few days.

Do you mean that you have a complete, separate manuscript written about OpenOmics, in addition to the paper.md you would submit to JOSS? E.g., like Physcraper https://github.com/pyOpenSci/software-review/issues/26 which has a paper on biorxiv https://www.biorxiv.org/content/10.1101/2020.09.15.299156v1

If that's the case, we should discuss more whether you want to submit to JOSS. We can perhaps tag some editors and ask if they can give us input. My impression is that JOSS is usually meant to provide a mechanism for getting publication credit for software in cases where the developer/maintainer can't easily publish a paper about it. Although I think they might have changed some of the language in their submission guidelines about this. See for example: https://joss.readthedocs.io/en/latest/submitting.html#co-publication-of-science-methods-and-software

so:
if you just want to go through PyOpenSci review and then submit to JOSS at the end, yes, totally fine assuming @lwasser agrees with me. Anything else, we should probably discuss a little more first

Hi @JonnyTran just want to follow up on this -- please let me know what you're thinking

I think I do have one potential reviewer and can move ahead with finding another whenever you're ready

Hi @NickleDave, sorry it took awhile to consult with my advisor. Yes, I meant that I'd like to check the box on automatic submission, and no, that I have not submitted a separate manuscript elsewhere. My intention for the JOSS submission is to publish contributions on the technical software aspects that I have thus far. But in later months (after finishing the web-app features in openomics), I do plan on making another contribution to a bioinformatics journal on the scientific bioinformatics use-cases. So I think this would fall under the "co-publication".

Thank you for the clarifications!

great, glad to hear it @JonnyTran
and I totally understand needing to consult with your advisor--didn't mean to rush you, just don't want this to fall of my to-do list

please do go ahead and edit your initial comment to check that automatic submission box, and make sure you address the to-do list in that section

I will continue with my part of the review process

Hi again @JonnyTran excited to let you know that @gawbul and @ksielemann have both kindly accepted our invitation to review

@gawbul actually started PyOpenSci back in 2013 (see this ROpenSci blogpost) and develops related tools such as pyEnsemblREST

@ksielemann has significant experience with omics datasets and was recommended to us by @bpucker as developer of the tool QUOD (from their publication https://www.biorxiv.org/content/10.1101/2020.04.28.065714v1.abstract)

@gawbul and @ksielemann here are related links again for your convenience: Our reviewers guide details what we look for in a package review, and includes links to sample reviews. Our standards are detailed in our packaging guide, and we provide a reviewer template for you to use. Please make sure you do not have a conflict of interest preventing you from reviewing this package. If you have questions or feedback, feel free to ask me here or by email, or post to the pyOpenSci forum.

I will update my editor checks above to add you both as reviewers, and set an initial due date of three weeks: February 15, 2021

this is so awesome!!

@gawbul hello again!! so great to see you here. We are moving forward with PyOS through the Sloan foundation (fingers crossed) as we briefly discussed forever ago. I'd love to see you continue to participate when you have time in whatever capacity you have time for!! :)

thank you all for this review!

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README

It is not completely clear what OpenOmics can be used for. An overview of the available functions and methods would be great: What exactly does OpenOmics do? What are specific usage examples (also e.g. after using OpenOmics: What's next?)?:

'OpenOmics facilitates the common coding tasks when preparing data for bioinformatics analysis.': For which bioinformatic analyses exactly?

[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[ ] Vignette(s) demonstrating major functionality that runs successfully locally

'# Load each expression dataframe': additional ')' in lines should be removed! as current form results in an error

mRNA = MessengerRNA(data=folder_path+"LUAD__geneExp.txt", transpose=True, usecols="GeneSymbol|TCGA", gene_index="GeneSymbol", gene_level="gene_name") Results in warnings (only in first time use): _'/homes/.local/lib/python3.7/site-packages/openomics/transcriptomics.py:95: FutureWarning: read_table is deprecated, use read_csv instead. /homes/.local/lib/python3.7/site-packages/openomics/transcriptomics.py:95: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delimwhitespace=False; you can avoid this warning by specifying engine='python'.'

som = SomaticMutation(data=folder_path+"LUAD__somaticMutation_geneLevel.txt", transpose=True, usecols="GeneSymbol|TCGA", gene_index="gene_name") Results in: _'KeyError: 'genename''. This should probably be _'geneindex="GeneSymbol"'.

luad_data.add_clinical_data(clinical_data=folder_path+"nationwidechildrens.org_clinical_patient_luad.txt") Results in warning: _'/homes/.local/lib/python3.7/site-packages/openomics/clinical.py:51: FutureWarning: read_table is deprecated, use readcsv instead, passing sep='\t'.'

gencode = GENCODE(path="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/",file_resources={"long_noncoding_RNAs.gtf": "gencode.v32.long_noncoding_RNAs.gtf.gz","basic.annotation.gtf": "gencode.v32.basic.annotation.gtf.gz","lncRNA_transcripts.fa": "gencode.v32.lncRNA_transcripts.fa.gz","transcripts.fa": "gencode.v32.transcripts.fa.gz"},remove_version_num=True,npartitions=5) Results in: _'AttributeError: 'io.TextIOWrapper' object has no attribute 'startswith''.

[ ] Function Documentation: for all user-facing functions

Please see above (in 'A statement of need').

[x] Examples for all user-facing functions
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[x] Badges for continuous integration and test coverage, a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the badge for pyOpenSci peer-review will be provided upon acceptance.)
[ ] Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.

Please see above (in 'A statement of need'). The goals could be communicated more specifically.

[x] Installation instructions
[x] Any additional setup required (authentication tokens, etc)
[x] Brief demonstration usage
[x] Direction to more detailed documentation (e.g. your documentation files or website).
[ ] If applicable, how the package compares to other similar packages and/or how it relates to other packages
[ ] Citation information

Citation information is missing at the end of the README.

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

[x] The documentation is easy to find and understand
[ ] The need for the package is clear

Please see above (in 'A statement of need'). Specific examples for bioinformatic analyses after the use of OpenOmics could be added.

[ ] All functions have documentation and associated examples for use

Please see above (in 'A statement of need'). An overview of all methods and functions of the package would be helpful.

Functionality

[ ] Installation: Installation succeeds as documented.

Installation with pip install openomics worked fine on the Linux system I am using. However, using my windows computer, I got the following error: error: Microsoft Visual C++ 14.0 is required

A list of dependencies/requirements in the README would be great.

from openomics import MultiOmics results in the "UserWarning: Tensorflow not installed; ParametricUMAP will be unavailable"

[ ] Functionality: Any functional claims of the software been confirmed.

Please see above.

[ ] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

Please see above.

[ ] Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

For packages co-submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5

Review Comments

The concept of OpenOmics seems interesting and useful for the integration of various omics datasets. Overall, the package has a clear documentation. However, there are still a few issues that should be addressed. Please see the points above and below.

Download of test data: Add code to download the data within a Python script so that the user does not have to download the whole repository or describe exactly how to download the test data. (Import TCGA LUAD data included in tests dataset (preprocessed from TCGA-Assembler) folder_path = "tests/data/TCGA_LUAD/" # Located at https://github.com/BioMeCIS-Lab/OpenOmics/tree/master/tests)is not sufficient.) Here is an example on how I did this:

#fetch data
import os
import tarfile
import urllib.request

def fetch_data(file_url, own_path, file_name):
    if not os.path.isdir(own_path):
        os.makedirs(own_path)
    own_file_path = os.path.join(own_path, file_name)
    urllib.request.urlretrieve(file_url, own_file_path)

FILE_NAMES = ["LUAD__geneExp.txt",
              "LUAD__miRNAExp__RPM.txt",
              "LUAD__protein_RPPA.txt",
              "LUAD__somaticMutation_geneLevel.txt",
              "TCGA-rnaexpr.tsv",
              "genome.wustl.edu_biospecimen_sample_luad.txt",
              "nationwidechildrens.org_clinical_drug_luad.txt",
              "nationwidechildrens.org_clinical_patient_luad.txt",
              "protein_RPPA.txt"]

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/BioMeCIS-Lab/OpenOmics/master/tests/data/TCGA_LUAD/"
OWN_PATH = os.path.join("data", "omics")

for file_name in FILE_NAMES:
  FILE_URL = DOWNLOAD_ROOT + file_name
  fetch_data(FILE_URL, OWN_PATH, file_name)

In the ReadTheDocs documentation:

gtex = GTEx(path="https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/") Results in: OSError: Not enough free space in /homes/.astropy/cache/download/url to download a 3.6G file, only 2.7G left.

Is there a possibility to choose the directory in which the files should be downloaded? This would be great.

Thanks so much for the fantastic reviews @ksielemann! I can work on the revisions, which should be available in 2 weeks.

Download of test data. Add code to download the data within a Python script so that the user does not have to download the whole repository or describe exactly how to download the test data.

Thanks for pointing this out and providing the automated scripts. I initially placed the test data at tests/data/ for automated tests when you run pytest ./ at the root directory. You can actually copy the codes at tests/test_multiomics.py to load the -omics data.

gtex = GTEx(path="https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/") Results in: OSError: Not enough free space in /homes/.astropy/cache/download/url to download a 3.6G file, only 2.7G left.

I used the package astropy to automatically cache downloaded files. It defaults to saving the files at /homes/.astropy/cache/, and ideally it should be in one location for each user-session. But I can see how useful it is for the user to be able to make a setting to choose a directory of their choice - I will see about making an openomics configuration file at the user directory, located at ~/.openomics/conf.json. I've made an issue at https://github.com/BioMeCIS-Lab/OpenOmics/issues/112

Just echoing @JonnyTran -- yes thank you for getting this detailed review back so quickly @ksielemann

Looks great to me. I will read in detail this weekend just to make sure I'm staying up to date with the review process

Just working through my review at present. Should have it done by the end of the day. Apologies for the delay.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README

So the package states it is solving the problem of targeting various multi-omics datasets, though this is fairly broad, perhaps because the intention is to make the scope of the project broader in the future. It isn't clear what datasets and data formats are supported, however. Perhaps this isn't relevant for the README, but I feel it could be included in the linked documentation on Read the Docs. The target audience is implicitly defined and my assumption is that this would primarily be used by bioinformaticians, though perhaps this could be more explicit?

[ ] Installation instructions: for the development version of package and any non-standard dependencies in README

Installation instructions seemed clear and I attempted to install via pip, however initially received the following error:

Installing collected packages: MarkupSafe, Werkzeug, numpy, Jinja2, itsdangerous, zope.interface, zope.event, urllib3, threadpoolctl, scipy, retrying, PyYAML, pytz, python-dateutil, pyparsing, ptyprocess, llvmlite, joblib, idna, heapdict, greenlet, Flask, chardet, certifi, brotli, zict, typing-extensions, tornado, toolz, tblib, soupsieve, sortedcontainers, scikit-learn, requests, psutil, plotly, pillow, pexpect, patsy, pandas, packaging, numba, msgpack, locket, gevent, future, flask-compress, decorator, dask, dash-table, dash-renderer, dash-html-components, dash-core-components, colorlog, cloudpickle, xmltodict, xlsxwriter, xlrd, wrapt, wget, suds-jurko, statsmodels, requests-cache, pynndescent, pyerfa, pydot, partd, networkx, lxml, kiwisolver, grequests, fsspec, easydev, docopt, distributed, dash, cython, cycler, cachetools, bokeh, beautifulsoup4, appdirs, validators, umap-learn, typing, sqlalchemy, scikit-allel, rarfile, obonet, matplotlib, large-image, h5py, gunicorn, gtfparse, goatools, filetype, dash-daq, dash-bootstrap-components, bioservices, biopython, astropy, openomics
    Running setup.py install for retrying ... done
    Running setup.py install for llvmlite ... error
    ERROR: Command errored out with exit status 1:
     command: /Users/stephenmoss/.pyenv/versions/3.9.0/bin/python3.9 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/setup.py'"'"'; __file__='"'"'/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-record-rgduko9u/install-record.txt --single-version-externally-managed --compile --install-headers /Users/stephenmoss/.pyenv/versions/3.9.0/include/python3.9/llvmlite
         cwd: /private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/
    Complete output (29 lines):
    running install
    running build
    got version from file /private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/llvmlite/_version.py {'version': '0.34.0', 'full': 'c5889c9e98c6b19d5d85ebdd982d64a03931f8e2'}
    running build_ext
    /Users/stephenmoss/.pyenv/versions/3.9.0/bin/python3.9 /private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/ffi/build.py
    LLVM version... Traceback (most recent call last):
      File "/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/ffi/build.py", line 105, in main_posix
        out = subprocess.check_output([llvm_config, '--version'])
      File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/subprocess.py", line 420, in check_output
        return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
      File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/subprocess.py", line 501, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/subprocess.py", line 947, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/subprocess.py", line 1819, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/ffi/build.py", line 191, in <module>
        main()
      File "/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/ffi/build.py", line 185, in main
        main_posix('osx', '.dylib')
      File "/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/ffi/build.py", line 107, in main_posix
        raise RuntimeError("%s failed executing, please point LLVM_CONFIG "
    RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
    error: command '/Users/stephenmoss/.pyenv/versions/3.9.0/bin/python3.9' failed with exit code 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/stephenmoss/.pyenv/versions/3.9.0/bin/python3.9 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/setup.py'"'"'; __file__='"'"'/private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-install-sm6pl876/llvmlite_0bff606e31a6496399a22ccfcec04d59/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/bf/cl_g6mhx7zd9_1mhhvppbzdh0000gn/T/pip-record-rgduko9u/install-record.txt --single-version-externally-managed --compile --install-headers /Users/stephenmoss/.pyenv/versions/3.9.0/include/python3.9/llvmlite Check the logs for full command output.

I needed to run the following to fix the issue:

brew install llvm@9
LLVM_CONFIG=/usr/local/opt/llvm@9/bin/llvm-config pip install openomics

_I tried with brew install llvm (version 11.0.1 and it failed with RuntimeError: Building llvmlite requires LLVM 10.0.x or 9.0.x, got '11.0.1'. Be sure to set LLVM_CONFIG to the right executable path._

Perhaps an external dependency on this can be specified (required by llvmlite)? The assumption is that the end-user has a working python installation and relevant compilers etc. installed, though this doesn't seem to be specified anywhere?

[ ] Vignette(s) demonstrating major functionality that runs successfully locally

When trying to run an openomics_test.py file with the from openomics import MultiOmics statement I received the following:

Creating directory /Users/stephenmoss/Library/Application Support/bioservices
Matplotlib is building the font cache; this may take a moment.
Traceback (most recent call last):
  File "/Users/stephenmoss/Dropbox/Code/openomics_test.py", line 1, in <module>
    from openomics import MultiOmics
  File "/Users/stephenmoss/Dropbox/Code/OpenOmics/openomics/__init__.py", line 40, in <module>
    from .visualization import (
  File "/Users/stephenmoss/Dropbox/Code/OpenOmics/openomics/visualization/umap.py", line 3, in <module>
    import umap
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/umap/__init__.py", line 2, in <module>
    from .umap_ import UMAP
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/umap/umap_.py", line 47, in <module>
    from pynndescent import NNDescent
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/__init__.py", line 3, in <module>
    from .pynndescent_ import NNDescent, PyNNDescentTransformer
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/pynndescent_.py", line 21, in <module>
    import pynndescent.sparse as sparse
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py", line 330, in <module>
    def sparse_alternative_jaccard(ind1, data1, ind2, data2):
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/decorators.py", line 218, in wrapper
    disp.compile(sig)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/dispatcher.py", line 819, in compile
    cres = self._compiler.compile(args, return_type)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/dispatcher.py", line 82, in compile
    raise retval
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/dispatcher.py", line 92, in _compile_cached
    retval = self._compile_core(args, return_type)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/dispatcher.py", line 105, in _compile_core
    cres = compiler.compile_extra(self.targetdescr.typing_context,
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler.py", line 627, in compile_extra
    return pipeline.compile_extra(func)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler.py", line 363, in compile_extra
    return self._compile_bytecode()
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler.py", line 425, in _compile_bytecode
    return self._compile_core()
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler.py", line 405, in _compile_core
    raise e
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler.py", line 396, in _compile_core
    pm.run(self.state)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 341, in run
    raise patched_exception
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 332, in run
    self._runPass(idx, pass_inst, state)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 291, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 264, in check
    mangled = func(compiler_state)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/typed_passes.py", line 92, in run_pass
    typemap, return_type, calltypes = type_inference_stage(
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/typed_passes.py", line 70, in type_inference_stage
    infer.propagate(raise_errors=raise_errors)
  File "/Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/typeinfer.py", line 1071, in propagate
    raise errors[0]
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython mode backend)
Failed in nopython mode pipeline (step: nopython mode backend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function make_quicksort_impl.<locals>.run_quicksort at 0x13e3a6940>) found for signature:

 >>> run_quicksort(array(int32, 1d, C))

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'register_jitable.<locals>.wrap.<locals>.ov_wrap': File: numba/core/extending.py: Line 150.
    With argument(s): '(array(int32, 1d, C))':
   Rejected as the implementation raised a specific error:
     UnsupportedError: Failed in nopython mode pipeline (step: analyzing bytecode)
   Use of unsupported opcode (LOAD_ASSERTION_ERROR) found

   File "../../../.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/misc/quicksort.py", line 180:
       def run_quicksort(A):
           <source elided>
               while high - low >= SMALL_QUICKSORT:
                   assert n < MAX_STACK
                   ^

  raised from /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/core/byteflow.py:269

During: resolving callee type: Function(<function make_quicksort_impl.<locals>.run_quicksort at 0x13e3a6940>)
During: typing of call at /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/np/arrayobj.py (5007)

File "../../../.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/np/arrayobj.py", line 5007:
    def array_sort_impl(arr):
        <source elided>
        # Note we clobber the return value
        sort_func(arr)
        ^

During: lowering "$14call_method.5 = call $12load_method.4(func=$12load_method.4, args=[], kws=(), vararg=None)" at /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/numba/np/arrayobj.py (5017)
During: lowering "$8call_method.3 = call $4load_method.1(arr, func=$4load_method.1, args=[Var(arr, sparse.py:28)], kws=(), vararg=None)" at /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py (28)
During: resolving callee type: type(CPUDispatcher(<function arr_unique at 0x13e1d4280>))
During: typing of call at /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py (41)

During: resolving callee type: type(CPUDispatcher(<function arr_unique at 0x13e1d4280>))
During: typing of call at /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py (41)

File "../../../.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py", line 41:
def arr_union(ar1, ar2):
    <source elided>
    else:
        return arr_unique(np.concatenate((ar1, ar2)))
        ^

During: resolving callee type: type(CPUDispatcher(<function arr_union at 0x13e1d4820>))
During: typing of call at /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py (331)

During: resolving callee type: type(CPUDispatcher(<function arr_union at 0x13e1d4820>))
During: typing of call at /Users/stephenmoss/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py (331)

File "../../../.pyenv/versions/3.9.0/lib/python3.9/site-packages/pynndescent/sparse.py", line 331:
def sparse_alternative_jaccard(ind1, data1, ind2, data2):
    num_non_zero = arr_union(ind1, ind2).shape[0]
    ^

This turned out to be an issue with python 3.9, which the package is supposed to support. I tried with python 3.8 instead, but it failed initially with the scipy install as it needed the BLAS and LAPACK libraries. I needed to install using:

brew install openblas lapack
LDFLAGS="-L/usr/local/opt/openblas/lib -L/usr/local/opt/lapack/lib" CPPFLAGS="-I/usr/local/opt/openblas/include -I/usr/local/opt/lapack/include" LLVM_CONFIG=/usr/local/opt/llvm@9/bin/llvm-config pip install openomics

Now running python openomics_test.py gives me only the following warning:

/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/umap/__init__.py:9: UserWarning: Tensorflow not installed; ParametricUMAP will be unavailable
  warn("Tensorflow not installed; ParametricUMAP will be unavailable")

Running the following rectifies this:

brew install libtensorflow
pip install tensorflow

I feel, therefore, a dependency on python 3.8 should be specified in the documentation and setup.py, as it looks like there are issues with python 3.9 at present. It would also be useful to include tensorflow in the list of package dependencies (i.e. requirements.txt) to avoid this warning. Using something like pipenv might be an ideal solution here? Though explicitly stating the external library dependencies for scipy would still be necessary.

When extending openomics_test.py to include the example for Load the multiomics: Gene Expression, MicroRNA expression lncRNA expression, Copy Number Variation, Somatic Mutation, DNA Methylation, and Protein Expression data, I get the following error running the first sample:

File "openomics_test.py", line 11
  usecols="GeneSymbol|TCGA", gene_index="GeneSymbol", gene_level="transcript_name")
  ^
IndentationError: unexpected indent

This is because there are close parentheses where there shouldn't be.

The example in the README should be the following (I have submitted a PR for this):

# Load each expression dataframe
mRNA = MessengerRNA(data=folder_path+"LUAD__geneExp.txt",
        transpose=True, usecols="GeneSymbol|TCGA", gene_index="GeneSymbol", gene_level="gene_name")
miRNA = MicroRNA(data=folder_path+"LUAD__miRNAExp__RPM.txt",
        transpose=True, usecols="GeneSymbol|TCGA", gene_index="GeneSymbol", gene_level="transcript_name")
lncRNA = LncRNA(data=folder_path+"TCGA-rnaexpr.tsv",
        transpose=True, usecols="Gene_ID|TCGA", gene_index="Gene_ID", gene_level="gene_id")
som = SomaticMutation(data=folder_path+"LUAD__somaticMutation_geneLevel.txt",
        transpose=True, usecols="GeneSymbol|TCGA", gene_index="gene_name")
pro = Protein(data=folder_path+"protein_RPPA.txt",
        transpose=True, usecols="GeneSymbol|TCGA", gene_index="GeneSymbol", gene_level="protein_name")

Running openomics_test.py now gives me the following warning for the MessengerRNA function:

/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/transcriptomics.py:95: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.

I believe this can be rectified by updating transcriptomics.py here with:

df = pd.read_table(data, sep=None, engine='python')

The SomaticMutation function gives me the following error:

Traceback (most recent call last):
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'gene_name'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "openomics_test.py", line 14, in <module>
    som = SomaticMutation(data=folder_path+"LUAD__somaticMutation_geneLevel.txt",
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/genomics.py", line 22, in __init__
    super(SomaticMutation, self).__init__(data=data, transpose=transpose, gene_index=gene_index, usecols=usecols,
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/transcriptomics.py", line 50, in __init__
    self.expressions = self.preprocess_table(df, usecols=usecols, gene_index=gene_index, transposed=transpose,
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/transcriptomics.py", line 148, in preprocess_table
    df = df[df[gene_index] != '?']
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'gene_name'

Using GeneSymbol instead of gene_name for the gene_index parameter in the vignette fixes this, e.g.:

som = SomaticMutation(data=folder_path+"LUAD__somaticMutation_geneLevel.txt",
        transpose=True, usecols="GeneSymbol|TCGA", gene_index="GeneSymbol")

Running openomics_test.py now gives me the following:

/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/transcriptomics.py:95: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
  df = pd.read_table(data, sep=None)
MessengerRNA (576, 20472) , indexed by: gene_name
MicroRNA (494, 1870) , indexed by: transcript_name
LncRNA (546, 12727) , indexed by: gene_id
SomaticMutation (889, 21070) , indexed by: GeneSymbol
Protein (364, 200) , indexed by: protein_name

This differs from the output in the README, however, which is:

PATIENTS (522, 5)
SAMPLES (1160, 6)
DRUGS (461, 4)
MessengerRNA (576, 20472)
SomaticMutation (587, 21070)
MicroRNA (494, 1870)
LncRNA (546, 12727)
Protein (364, 154)

Running the example under Annotate LncRNAs with GENCODE genomic annotations returns the following:

Downloading ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.long_noncoding_RNAs.gtf.gz
|========================================================================================================================================================================================================| 4.4M/4.4M (100.00%)         0s
Downloading ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.basic.annotation.gtf.gz
|========================================================================================================================================================================================================|  26M/ 26M (100.00%)         7s
Downloading ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.lncRNA_transcripts.fa.gz
|========================================================================================================================================================================================================|  14M/ 14M (100.00%)         3s
Downloading ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.transcripts.fa.gz
|========================================================================================================================================================================================================|  72M/ 72M (100.00%)        15s
INFO:root:<_io.TextIOWrapper name='/Users/stephenmoss/.astropy/cache/download/url/141581d04d4001254d07601dfa7d983b/contents' encoding='UTF-8'>
Traceback (most recent call last):
  File "openomics_test.py", line 34, in <module>
    gencode = GENCODE(path="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/",
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/database/sequence.py", line 67, in __init__
    super(GENCODE, self).__init__(path=path, file_resources=file_resources, col_rename=col_rename,
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/database/sequence.py", line 17, in __init__
    super(SequenceDataset, self).__init__(**kwargs)
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/database/base.py", line 39, in __init__
    self.data = self.load_dataframe(file_resources, npartitions=npartitions)
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/database/sequence.py", line 74, in load_dataframe
    df = read_gtf(file_resources[gtf_file], npartitions=npartitions)
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/utils/read_gtf.py", line 349, in read_gtf
    result_df = parse_gtf_and_expand_attributes(
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/utils/read_gtf.py", line 290, in parse_gtf_and_expand_attributes
    result = parse_gtf(
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/openomics/utils/read_gtf.py", line 195, in parse_gtf
    chunk_iterator = dd.read_table(
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/dask/dataframe/io/csv.py", line 659, in read
    return read_pandas(
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/dask/dataframe/io/csv.py", line 464, in read_pandas
    paths = get_fs_token_paths(urlpath, mode="rb", storage_options=storage_options)[
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/fsspec/core.py", line 619, in get_fs_token_paths
    path = cls._strip_protocol(urlpath)
  File "/Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/fsspec/implementations/local.py", line 147, in _strip_protocol
    if path.startswith("file://"):
AttributeError: '_io.TextIOWrapper' object has no attribute 'startswith'

I initially submitted an issue with fsspec here, but it turned out to be due to the gzip library wrapping the return object in an io.TextIOWrapper object? This is due to the fsspec package being unable to determine a path for the object and therefore returning the original object, which then fails downstream. See the issue for more details, but we somehow need to return a legitimate file object, instead of an io.TextIOWrapper object that has a resolvable path and therefore doesn't cause the failure in the fsspec package.

I was unable to continue with the rest of the vignettes as a result despite attempting some fixes locally.

[ ] Function Documentation: for all user-facing functions

Docstrings seem to be available throughout the codebase for all relevant functions, however, I found the documentation on Read the Docs to be lacking in comparison, particularly in providing detail of function parameters that are otherwise available in the docstrings.

[ ] Examples for all user-facing functions

Looking at the various classes and respective functions throughout the codebase, there doesn't seem to be a full coverage of of all functionality in the examples provided in the documentation. This would be too much for the README, I feel, but could certainly be represented on Read the Docs.

[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.

There is no link to the contribution guidelines from the main README, which I feel would be beneficial, however contributing guidelines are available in https://github.com/BioMeCIS-Lab/OpenOmics/blob/master/CONTRIBUTING.rst and on Read the Docs.

Just being finicky, I personally would prefer a common format for the README and CONTRIBUTING documents etc. Either both in reStructuredText or both in Markdown. There seems to be an outdated README.rst that could probably be replaced with the current README.md?

[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[x] The package name
[x] Badges for continuous integration and test coverage, a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the badge for pyOpenSci peer-review will be provided upon acceptance.)
[x] Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
[x] Installation instructions
[ ] Any additional setup required (authentication tokens, etc)

I feel some things are missing regarding setup information for certain platforms (i.e. I had issues on macOS). Use of pipenv may be beneficial for certain dependencies, i.e. those that generated warnings. Reproducibiltiy across platforms is always an issue, though perhaps a Docker image with all dependencies required could be made available for people to run their scripts?

[x] Brief demonstration usage
[x] Direction to more detailed documentation (e.g. your documentation files or website).
[ ] If applicable, how the package compares to other similar packages and/or how it relates to other packages

No comparisons are made to other packages. I feel it is similar in some ways to PyCogent, though this is no longer actively maintained, and perhaps even comparable in some ways to QIIME2?

[ ] Citation information

No citation information is provided in the README, though perhaps this can be made available if submitted to JOSS. It would also be great to see a DOI made available via GitHub's integration with Zenodo as mentioned by @NickleDave.

Usability

One minor PR submitted as described above. I'll try see if I can spare some time to look into the gzip issue at some point too.

[x] The documentation is easy to find and understand

Though could be extended as discussed above.

[x] The need for the package is clear

This is relatively clear, but could be expanded on as above.

[x] All functions have documentation and associated examples for use

As discussed above, there appear to be docstrings throughout the code, but this doesn't seem to be reflected in the README or online documentation, particularly for function parameters.

Functionality

[ ] Installation: Installation succeeds as documented.

I had several issues with installation as described above, though I only tested on a macOS system running Big Sur version 11.2. It appears there are issues with Python 3.9.x in particular, though I can't confirm whether this is platform specific. There were some failures in building scipy on Python 3.8.7 due to dependent libraries missing on my machine.

[ ] Functionality: Any functional claims of the software been confirmed.

The package goes a good way towards meeting the claims it reports to be developed for, though the difficulties in getting the examples to run means I was limited in my ability to fully assess them.

[ ] Performance: Any performance claims of the software been confirmed.

No performance claims were provided. On my 16" MacBook Pro with 2.3 GHz Octa-core Intel Core i9 and 32GB RAM, however, I felt the package was a little slow in loading it's dependencies on first run. Tests also took some time to complete.

[ ] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

Some tests are available and run as part of the Travis CI pipeline, though coverage isn't amazing and would benefit from additional work. Focusing on test driven development is a good way to ensure greater coverage. Running the tests locally took a long time, and returned various warnings and an error.

☁ OpenOmics [master] python -m pytest --cov=./
============================================================================================================ test session starts =============================================================================================================
platform darwin -- Python 3.8.7, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /Users/stephenmoss/Dropbox/Code/OpenOmics, configfile: setup.cfg
plugins: cov-2.11.1, dash-1.19.0
collected 35 items

tests/test_annotations.py .........                                                                                                                                                                                                    [ 25%]
tests/test_disease.py ..........                                                                               [ 54%]
tests/test_interaction.py .E.....                                                                              [ 74%]
tests/test_multiomics.py ...                                                                                   [ 82%]
tests/test_sequences.py ......                                                                                 [100%]

======================================================= ERRORS =======================================================
______________________________________ ERROR at setup of test_import_MiRTarBase ______________________________________

    @pytest.fixture
    def generate_MiRTarBase():
>       return MiRTarBase(path="/data/datasets/Bioinformatics_ExternalData/miRTarBase/", strip_mirna_name=True,
                          filters={"Species (Target Gene)": "Homo sapiens"})

tests/test_interaction.py:19:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
openomics/database/interaction.py:611: in __init__
    super(MiRTarBase, self).__init__(path=path, file_resources=file_resources,
openomics/database/interaction.py:40: in __init__
    self.validate_file_resources(path, file_resources, verbose=verbose)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <openomics.database.interaction.MiRTarBase object at 0x1aa5036d0>
path = '/data/datasets/Bioinformatics_ExternalData/miRTarBase/'
file_resources = {'miRTarBase_MTI.xlsx': '/data/datasets/Bioinformatics_ExternalData/miRTarBase/miRTarBase_MTI.xlsx'}
npartitions = None, verbose = False

    def validate_file_resources(self, path, file_resources, npartitions=None, verbose=False) -> None:
        """For each file in file_resources, fetch the file if path+file is a URL
        or load from disk if a local path. Additionally unzip or unrar if the
        file is compressed.

        Args:
            path (str): The folder or url path containing the data file
                resources. If url path, the files will be downloaded and cached
                to the user's home folder (at ~/.astropy/).
            file_resources (dict): default None, Used to list required files for
                preprocessing of the database. A dictionary where keys are
                required filenames and value are file paths. If None, then the
                class constructor should automatically build the required file
                resources dict.
            npartitions:
            verbose:
        """
        if validators.url(path):
            for filename, filepath in copy.copy(file_resources).items():
                data_file = get_pkg_data_filename(path, filepath,
                                                  verbose=verbose)  # Download file and replace the file_resource path
                filetype_ext = filetype.guess(data_file)

                # This null if-clause is needed incase when filetype_ext is None, causing the next clause to fail
                if filetype_ext is None:
                    file_resources[filename] = data_file

                elif filetype_ext.extension == 'gz':
                    file_resources[filename] = gzip.open(data_file, 'rt')

                elif filetype_ext.extension == 'zip':
                    zf = zipfile.ZipFile(data_file, 'r')

                    for subfile in zf.infolist():
                        if os.path.splitext(subfile.filename)[-1] == os.path.splitext(filename)[-1]: # If the file extension matches
                            file_resources[filename] = zf.open(subfile.filename, mode='r')

                elif filetype_ext.extension == 'rar':
                    rf = rarfile.RarFile(data_file, 'r')

                    for subfile in rf.infolist():
                        if os.path.splitext(subfile.filename)[-1] == os.path.splitext(filename)[-1]: # If the file extension matches
                            file_resources[filename] = rf.open(subfile.filename, mode='r')
                else:
                    file_resources[filename] = data_file

        elif os.path.isdir(path) and os.path.exists(path):
            for _, filepath in file_resources.items():
                if not os.path.exists(filepath):
                    raise IOError(filepath)
        else:
>           raise IOError(path)
E           OSError: /data/datasets/Bioinformatics_ExternalData/miRTarBase/

openomics/database/base.py:113: OSError
================================================== warnings summary ==================================================
../../../.pyenv/versions/3.8.7/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
  /Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: collect_ignore

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

tests/test_annotations.py: 6 warnings
tests/test_disease.py: 6 warnings
tests/test_interaction.py: 4 warnings
tests/test_multiomics.py: 3 warnings
tests/test_sequences.py: 4 warnings
  /Users/stephenmoss/Dropbox/Code/OpenOmics/openomics/transcriptomics.py:108: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
    df = pd.read_table(data, sep=None)

tests/test_annotations.py::test_import_GTEx
tests/test_annotations.py::test_GTEx_annotate
  /Users/stephenmoss/Dropbox/Code/OpenOmics/openomics/database/annotation.py:239: FutureWarning: The default value of regex will change from True to False in a future version.
    gene_exp_medians["Name"] = gene_exp_medians["Name"].str.replace("[.].*", "")

tests/test_disease.py::test_import_HMDD
tests/test_disease.py::test_annotate_HMDD
  /Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/encodings/unicode_escape.py:26: DeprecationWarning: invalid escape sequence '\ '
    return codecs.unicode_escape_decode(input, self.errors)[0]

tests/test_disease.py::test_import_HMDD
tests/test_disease.py::test_annotate_HMDD
  /Users/stephenmoss/.pyenv/versions/3.8.7/lib/python3.8/encodings/unicode_escape.py:26: DeprecationWarning: invalid escape sequence '\s'
    return codecs.unicode_escape_decode(input, self.errors)[0]

tests/test_interaction.py::test_import_LncRNA2Target
  /Users/stephenmoss/Dropbox/Code/OpenOmics/openomics/database/interaction.py:476: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
    table = pd.read_excel(file_resources["lncRNA_target_from_low_throughput_experiments.xlsx"])

tests/test_interaction.py::test_import_LncRNA2Target
  /Users/stephenmoss/Dropbox/Code/OpenOmics/openomics/database/interaction.py:480: FutureWarning: The default value of regex will change from True to False in a future version.
    table["Target_official_symbol"] = table["Target_official_symbol"].str.replace("(?i)(mir)", "hsa-mir-")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform darwin, python 3.8.7-final-0 -----------
Name                                       Stmts   Miss  Cover
--------------------------------------------------------------
openomics/__init__.py                         25      7    72%
openomics/clinical.py                         57     19    67%
openomics/database/__init__.py                 7      0   100%
openomics/database/annotation.py             197    110    44%
openomics/database/base.py                   140     37    74%
openomics/database/disease.py                 61      1    98%
openomics/database/interaction.py            330    196    41%
openomics/database/ontology.py               152     93    39%
openomics/database/sequence.py               128     66    48%
openomics/genomics.py                         26      8    69%
openomics/imageomics.py                       63     47    25%
openomics/multicohorts.py                      0      0   100%
openomics/multiomics.py                      111     66    41%
openomics/proteomics.py                       13      2    85%
openomics/transcriptomics.py                 111     32    71%
openomics/utils/GTF.py                        53     53     0%
openomics/utils/__init__.py                    0      0   100%
openomics/utils/df.py                         23     11    52%
openomics/utils/io.py                         40     19    52%
openomics/utils/read_gtf.py                  107     24    78%
openomics/visualization/__init__.py            1      0   100%
openomics/visualization/heatmat.py            11      8    27%
openomics/visualization/umap.py               29     24    17%
openomics_web/__init__.py                      0      0   100%
openomics_web/app.py                          69     69     0%
openomics_web/callbacks.py                     0      0   100%
openomics_web/layouts/__init__.py              0      0   100%
openomics_web/layouts/annotation_view.py       0      0   100%
openomics_web/layouts/app_layout.py            7      7     0%
openomics_web/layouts/clinical_view.py        10     10     0%
openomics_web/layouts/control_tabs.py          5      5     0%
openomics_web/layouts/datatable_view.py       28     28     0%
openomics_web/server.py                        2      2     0%
openomics_web/utils/__init__.py                0      0   100%
openomics_web/utils/io.py                     62     62     0%
openomics_web/utils/str_utils.py              25     25     0%
setup.py                                      44     44     0%
tests/__init__.py                              0      0   100%
tests/data/__init__.py                         0      0   100%
tests/data/test_dask_dataframes.py             0      0   100%
tests/test_annotations.py                     39      0   100%
tests/test_disease.py                         34      0   100%
tests/test_interaction.py                     20      1    95%
tests/test_multiomics.py                      46      1    98%
tests/test_sequences.py                       18      1    94%
--------------------------------------------------------------
TOTAL                                       2094   1078    49%

============================================== short test summary info ===============================================
ERROR tests/test_interaction.py::test_import_MiRTarBase - OSError: /data/datasets/Bioinformatics_ExternalData/miRTa...
================================ 34 passed, 32 warnings, 1 error in 662.47s (0:11:02) ================================

The main error seemed to be a missing dataset. On further inspection of the codebase it seems that the package is supposed to download the miRTarBase data (although it appears to have the version 7 release URL hardcoded when version 8 is now available). I wondered whether this was a permissions issue with not being able to create the /data/datasets/Bioinformatics_ExternalData/miRTarBase/ path on my system. I tried with sudo python -m pytest --cov=./ tests/test_interaction.py and got the same? I tried sudo mkdir -p /data/datasets/Bioinformatics_ExternalData/miRTarBase beforehand, which returned:

mkdir: /data/datasets/Bioinformatics_ExternalData/miRTarBase: Read-only file system

This is likely due to macOS system integrity protection.

However, it seems I am also unable to resolve http://mirtarbase.mbc.nctu.edu.tw/cache/download/7.0/. I believe the URL should actually be http://mirtarbase.cuhk.edu.cn/cache/download/7.0/ (or even http://mirtarbase.cuhk.edu.cn/cache/download/8.0/)? I manually updated to the working version 7.0 release and updated the path in test_interaction.py before running the following:

mkdir -p tests/data/datasets/Bioinformatics_ExternalData/miRTarBase
sudo python -m pytest --cov=./ tests/test_interaction.py

I still received the error, so something needs looking at in more detail here.

[x] Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

When raising a PR the build in Travis CI failed for all python versions. There is some work needed to get these issues resolved though I didn't inspect the output in detail.

[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

~~#### For packages co-submitting to JOSS~~

~~- [ ] The package has an obvious research application according to JOSS's definition in their submission requirements.~~

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

~~The package contains a paper.md matching JOSS's requirements with:~~

~~- [ ] A short summary describing the high-level functionality of the software~~ ~~- [ ] Authors: A list of authors with their affiliations~~ ~~- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.~~ ~~- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).~~

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

A number of changes and bug fixes are required before I would recommend approving this package, but in general I feel it would be a great addition to pyOpenSci.

Estimated hours spent reviewing: 7

Review Comments

I would recommend looking through the Author's Guide in more detail, particularly the Tools for developers section. Using git pre-commit hooks for local development would be beneficial for both the author and any contributors and would enable the production of greater quality code. These can also be integrated with Travis CI to ensure any pull requests also meet the same requirements via automated testing against a variety of different Python versions.

I echo the comments of @ksielemann, in that the package seems very interesting and I can see it would have a broad range of applications.

Hopefully, we can work together to get the issues resolved and get this approved. Would be interested in seeing this in JOSS at some point too.

@JonnyTran

Just in case it gets lost in the review, I submitted PRs for a couple of the issues I hit here https://github.com/BioMeCIS-Lab/OpenOmics/pull/103 and here https://github.com/BioMeCIS-Lab/OpenOmics/pull/105.

Also opened an issue for the problem with the gzip.open returning an io.TextIOWrapper object here https://github.com/BioMeCIS-Lab/OpenOmics/issues/104.

Thank you @gawbul for the very thorough review, really appreciate that you could get that back to us a week before the deadline, especially with other things you've been dealing with.

@JonnyTran just want to check where you are at with this. I know it might feel like a lot, and you could have other things going on.

Our guidelines suggest aiming for a two-week turnaround time after reviews are in. We definitely don't have to hold strictly to that especially if you have other obligations to deal with.

But please when you can just give me some idea of how you'll move forward. I would suggest converting reviewer comments into issues on OpenOmics and linking to them where you can. See for example issues on physcraper from their review: https://github.com/McTavishLab/physcraper/issues

Thanks so much for the thorough review @gawbul! 🥰

@NickleDave I’ve been trying to go over the issues. Actually it has been difficult because of the constant rolling electricity blackouts from the snowstorm currently in my state, Texas 🥶.

I will work on making the issues on GH and fix them soon!

!!! I'm sorry, didn't realize you were in Texas! The situation with the power grid is insane. Hope you can stay warm and safe!

Thank you for letting us know!

Hi @gawbul, I've just gone over your reviews. Thanks for the care and attention on testing this software. There were many issues that I've missed and I've created Issues for most of your comments.

I had several issues with installation as described above, though I only tested on a macOS system running Big Sur version 11.2. It appears there are issues with Python 3.9.x in particular, though I can't confirm whether this is platform specific. There were some failures in building scipy on Python 3.8.7 due to dependent libraries missing on my machine.

I'm sorry you've had troubles running OpenOmics because of issues with installing the package dependencies from requirement.txt https://github.com/BioMeCIS-Lab/OpenOmics/issues/113, and importing umap when running the SomaticMutation vignettes https://github.com/BioMeCIS-Lab/OpenOmics/issues/114.

I have not seen these errors before as I've never ran tests on Mac OS X with Python 3.9. I've primarily been developing with the Anaconda Python 3.7 environment which already comes with a llvm installation, so the pip install openomics did not show any errors. Since this is still an issue on Mac OS X, I try to use more TravisCI automated tests against a variety of different Python versions and debug these problems. I've made an issue at https://github.com/BioMeCIS-Lab/OpenOmics/issues/117.

Docstrings seem to be available throughout the codebase for all relevant functions, however, I found the documentation on Read the Docs to be lacking in comparison, particularly in providing detail of function parameters that are otherwise available in the docstrings.

Currently documentations on Read the Docs has mostly been auto-generated by Sphinx. As pointed out by both @gawbul and @lwasser , I agree that Read the Docs documentations could be better, particularly Usage guides and Vignettes. Also, it is a great suggestion to use only markdown or reStructredText (rather than both). I've opened the issue https://github.com/BioMeCIS-Lab/OpenOmics/issues/119.

Running the example under Annotate LncRNAs with GENCODE genomic annotations returns the following: AttributeError: '_io.TextIOWrapper' object has no attribute 'startswith'

This issue has been fixed at https://github.com/BioMeCIS-Lab/OpenOmics/issues/104

No performance claims were provided. On my 16" MacBook Pro with 2.3 GHz Octa-core Intel Core i9 and 32GB RAM, however, I felt the package was a little slow in loading it's dependencies on first run. Tests also took some time to complete.

Tests currently are running with a large set of genome-wide RNA's, but probably only needs to test on a subset of the data. Will work on reducing the workload to make tests faster.

There were many other issues listed at https://github.com/BioMeCIS-Lab/OpenOmics/issues. Is there a deadline to address all these? I think the issues with Read the Docs and automated tests targeting MacOS + Python3.9 can take a week to finish.

Hey @JonnyTran there's no strict deadline.

Our guide says "aim for one week" but with everyone already stressed out by the pandemic, and you facing even more problems because of the situation in Texas right now, I would not ask you to meet that.

Does three weeks sound do-able to you? If not we can definitely figure something else out. I just want to have something in my calendar to make sure I can keep track of reviews. Please let me know.

Your comment above seems like it covers most of the feedback from the reviews. But when you get a chance and things are a little more back to normal for you, please do also make sure you address any specific comments from @ksielemann too.

Hi @NickleDave. Yes, 3 weeks is plenty of time for me to address the issues - how does Friday, March 12 sounds?

Hi again @JonnyTran -- sorry for not replying sooner. I saw this and then got distracted

Yes March 12 is perfect if that still works for you. Will put it in my calendar.

Hi guys @NickleDave @ksielemann @gawbul. It's been awhile, but I finally managed to address most of the comments. Yay!

For @ksielemann comments:

Download of test data

I implemented a way to load the Expression data files directly from URL, see https://openomics.readthedocs.io/en/latest/usage/getting-started.html#creating-a-multi-omics-dataset

choose the directory in which the files should be downloaded?

There is now a function to do so, see https://openomics.readthedocs.io/en/latest/usage/annotate-external-databases.html#setting-the-cache-download-directory

For @gawbul comments:

Some tests are available and run as part of the Travis CI pipeline, though coverage isn't amazing and would benefit from additional work.

I've set up GitHub Actions at https://github.com/BioMeCIS-Lab/OpenOmics/actions/workflows/python-package.yml to replace Travis CI for better pricing plans. The automated test suite currently targets Mac OS and Linux for Python 3.6-3.9. A few tests are failing (due to unavailability of some FTP servers), but I believe you shouldn't have the same problems on your Mac OS + Python 3.9 anymore.

I found the documentation on Read the Docs to be lacking

I did a complete revamp of ReadTheDocs documentation site, especially vignettes and usage guide at https://openomics.readthedocs.io/en/latest/usage/getting-started.html. The structure of API references is in place, although more in-depth usage guides should be written.

The readme.md file is also edited to reflect guidelines from https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/#readme

For the JOSS submission, my manuscript can be compiled from https://github.com/BioMeCIS-Lab/OpenOmics/tree/master/inst

Is there anything else I might be missing?

Thank you @JonnyTran I can see you've put a ton of work in to addressing the reviewer comments.
And thank you for getting back to us by March 15th.

@ksielemann and @gawbul can you please let @JonnyTran know whether you feel the changes made, outlined in the comment above, are sufficient to address revisions you suggested in your reviews?

@JonnyTran I don't think there's anything else you're missing.
I will double-check and get back to you by Wednesday at the latest

First of all, thank you @JonnyTran for addressing the comments above! Some of my review comments are embedded in the specific points of the Package Review form. I believe that these points were not yet addressed (or did I miss something?).

Thank you for your quick reply @ksielemann Yes, I see that some of your comments were embedded in the form.

I don't mean to make more work for you, but could I ask you to raise separate issues on the OpenOmics repository for any comments that you feel have not yet been addressed?

This is the usual approach that JOSS reviews use (to raise issues with details on the repo, and then link to them / summarize on the "review issue"), I think to avoid situations like this. We should probably have very clear instructions suggesting the same approach in our guidelines--our fault.

@gawbul I think that you did raise issues for some of your comments. Can you please also check whether there are any that remain to be addressed?

I am sorry that I embedded the comments so that they can be easily overlooked! I now opened a few issues with my comments.

No need to apologize @ksielemann , definitely my fault for not being clearer about process. Thank you for taking time to open issues. That will help.

Sorry @JonnyTran for adding more to your plate. I just want to make sure it's very clear what review criteria have been met, according to reviewers.

If we need to, we can discuss further here, and you can link to specific issues.

I will check back by Friday at the latest. Again, as far as JOSS goes, if the manuscript compiles and you have a DOI for the version that we approve, then I think you are good to go. I will make sure of that when we reach that point.

I just closed my last issue with this comment: 'I think it is really important for future users that the usage guide works without errors. Otherwise, the user might get frustrated and refrains from using the library. So the usage guide should be updated according to the functionalities and current version of the package. But I think this can also happen while the package is further developed.'

So from my side, all my comments are sufficiently addressed now!

Great thank you @ksielemann glad to hear it.
Looks like the additional issues were all easily addressed or fixed already. Sorry, I didn't mean to ask you to do extra work, just wanted to make sure we were all on the same page about requested revisions. Thank you again 🙏

@gawbul just want to check back -- can you please confirm whether your comments have been addressed?

Looks like the corresponding issues were: https://github.com/BioMeCIS-Lab/OpenOmics/issues/119 https://github.com/BioMeCIS-Lab/OpenOmics/issues/115 https://github.com/BioMeCIS-Lab/OpenOmics/issues/114 https://github.com/BioMeCIS-Lab/OpenOmics/issues/113

@NickleDave @JonnyTran I'll look at this asap. I've not been well this last while and am trying to recover. I won't forget 👍

thank you @gawbul for letting us know! sorry, somehow missed that you replied here

No problem 😄 I'll get around to checking this one night this week 👍

All looks good 👍

I made a quick comment here regarding an issue with the docs https://github.com/BioMeCIS-Lab/OpenOmics/issues/119#issuecomment-825137257, but otherwise, I'm happy everything has been addressed ☺️

Excellent, thank you so much @gawbul and @ksielemann for your very thorough reviews

🎉 openomics has been approved by pyOpenSci! Thank you @JonnyTran for submitting

There are a few things left to do to wrap up this submission:

[x] Add the badge for pyOpenSci peer-review to the README.md of openomics. The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/31)
[x] Add openomics to the pyOpenSci website. @JonnyTran , please open a PR to update this file: to add your package and name to the list of contributors
[ ] @JonnyTran @gawbul @ksielemann if you have time and are open to being listed on our website, please add yourselves to this file via a PR so we can list you on our website as contributors!

Since this package is going to move on to JOSS for review, you'll also want to do the following:

[x] Tag and create a new release through GitHub, to produce a new version and DOI listed on Zenodo
[ ] Submit to JOSS using the Zenodo DOI. We will tag it for expedited review. @JonnyTran feel free to tag me (@NickleDave) and reference this issue with a link when you submit to JOSS just so I can stay up to date with the process there.

All -- if you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and our documentation in the contributing-guide. We have also been updating our documentation to improve the process so all feedback is appreciated!

@NickleDave will you kindly fill out the very top of this submission with the reviewers, review version accepted etc - the very first comment? we want to ensure that we keep track of that for every review. once all is filled out and the review is complete (boxes checked above!) we can close the issue! thank you all!!

Thank you for taking care of those final to-dos @JonnyTran

Just checking, are you about to submit to JOSS? Based on your last couple of commits I'd guess yes. Please do let us know and/or reference this issue on the JOSS review when you do.

@lwasser I have edited the first comment to reflect those changes -- not sure if there's more I need to resolve what I was assigned through GitHub. I will close this once the JOSS review is initiated

@ksielemann @gawbul would you be okay with me adding you as contributors to the pyOpenSci site? I can request your review on the PR when I do so

Hi @NickleDave,

Yes I've submitted to JOSS about 4 days ago, mentioning this was approved by pyOpenSci. I will reference this issue once the review process starts on JOSS's github.

Thanks for the updates!

Ah great -- I didn't realize how the process worked, I looked for an issue on their repo but didn't see it.

Thank you for letting me know! Please just let me know if you need anything from us for review at JOSS.

Going to go ahead and close. Yay!!! congrats on completing review and officially becoming part of pyOpenSci! Will tweet about openomics later! Thank you again @ksielemann and @gawbul for your great reviews

Thanks so much for everyone's patience, help and support, @NickleDave, @lwasser, @gawbul, and @ksielemann !

Most of all thanks for making this a better software! I will work on making it more usable for all.

oh all - So the JOSS review process is simple in that they accept our review as theirs! @arfon please note that this package was submitted to the JOSS review process. Can we please fast track it given we have reviewed here on the pyopensci side of things. Please let us know what you need. @JonnyTran is there an open issue in JOSS right now? can you kindly reference it here if you haven't already.

I normally keep these reviews open here until the JOSS part is finished. When it is they will ask you to add the JOSS badge on your readme as well. thank you all for this!

Hi @lwasser , there isn't a JOSS issue yet, but I will tag this pyOpenSci issue once it is opened.

ahh ok perfect. Normally they just accept through our issue! let's wait for arfon to get back to us here to ensure that is still the best process! congratulations on being a part of the pyopensci ecosystem and thank you for your submission here!

@ksielemann @gawbul would you be okay with me adding you as contributors to the pyOpenSci site? I can request your review on the PR when I do so

Sure, I am okay with this! Should I add myself to the file you mentioned above or do you prefer to add me to the contributor site yourself?

Sorry for the delay folks. Things are now moving in https://github.com/openjournals/joss-reviews/issues/3249

pyOpenSci / software-submission