greenelab / iscb-diversity

Analyzing diversity of ISCB keynote speakers & fellows compared to the field of bioinformatics
https://greenelab.github.io/iscb-diversity
Other
7 stars 0 forks source link
bioinformatics computational-biology diversity iscb ismb keynotes pubmed

Diversity of ISCB Honorees

This is a data analysis repository for the study at https://greenelab.github.io/iscb-diversity-manuscript/.

Datasets

Datasets are stored in the data directory. This repository uses Git LFS to store large / binary datasets. Make sure to have Git LFS installed locally before cloning the repository, if you'd like to download the datasets. You can also download datasets directly from the GitHub website by clicking "Raw".

The source code saves large files using XZ compression (denoted by an .xz extension). Since not all users are familiar with XZ-compression, we have also created gzip exports of all XZ-compressed files (with the convert-xz-to-gzip.bash script). These files are placed alongside their XZ source in the data directory. The source code pipelines use XZ compression since gzip encodes a timestamp causing non-deterministic output files.

Development

This repository has a corresponding Docker image with the required dependencies. See environment for the Docker image specification.

Note that the following Docker commands have a --mount argument to give the Docker container access to files in this repository. Therefore, any changes to the repository content created while running the Docker container will persist in this directory after the container is stopped.

The Docker image is automatically built and published by a GitHub Action. Even though this repository is public, GitHub requires authentication to download from its package registry. Therefore, you will need a GitHub account to pull the image.

Use the following steps to authenticate your local docker with your GitHub. Go to https://github.com/settings/tokens and create a new personal access token, selecting only the read:packages scope. You can name the token anything, for example "docker login read-only token". Then run the following command, substituting your username and token from above:

docker login --username USERNAME --password TOKEN docker.pkg.github.com

Interactive

For interactive development in Python notebooks, run the following command:

# This command must be run with the repository root as your working directory.
# Requires docker version >= 17.06.
docker run \
  --name iscb-diversity \
  --detach --rm \
  --env JUPYTER_TOKEN=ksbegpqzrurktbkikyo \
  --publish 8899:8888 \
  --mount type=bind,source="$(pwd)",target=/user/jupyter \
  docker.pkg.github.com/greenelab/iscb-diversity/iscb-diversity

Then navigate to the following URL in your browser: http://localhost:8899?token=ksbegpqzrurktbkikyo

You should see a Jupyter Notebook landing page where you can open, edit, and run any of the notebooks.

When you are done, you shutdown the Jupyter notebook server and remove the Docker container by running docker stop iscb-diversity in a new terminal.

Similarly, for the R notebooks:

# This command must be run with the repository root as your working directory.
# Requires docker version >= 17.06.
docker run \
  --name iscb-diversity-r \
  --detach --rm \
  --publish 8787:8787 \
  --env DISABLE_AUTH=true \
  --mount type=bind,source="$(pwd)",target=/home/rstudio/repo \
  docker.pkg.github.com/greenelab/iscb-diversity/iscb-diversity-r

Navigate to http://localhost:8787 and you should be logged into RStudio as the rstudio user. When you are done, shutdown the RStudio server and remove the Docker container by running docker stop iscb-diversity-r.

GitHub Pages

The docs directory is used as the GitHub Pages source for https://github.com/greenelab/iscb-diversity. To regenerate outputs in the docs directory, run the following command

python utils/prepare_docs.py --nbviewer --readme

Edit utils/prepare_docs.py to change the template for docs/readme.md.

License

The entire repository is released under the CC BY 4.0 License available in license.md. All code files and snippets are additionally released under the BSD 3-Clause License available in license-code.md.