microbiomedata / nmdc_notebooks

Jupyter Notebooks demonstrating R and Python-based access to NMDC metadata and data
Creative Commons Zero v1.0 Universal
5 stars 0 forks source link

NMDC Data and Metadata R and Python Sample Jupyter Notebooks

Quick Start

Notebooks that are ready for use and exploration.

Overview

This repository includes jupyter notebooks that explore and analyze microbiome data from the National Microbiome Data Collaborative's (NMDC) data portal. These notebooks aim to:

Each folder's scope attempts to explore a scientific question using the NMDC's (meta)data. A folder includes a README.md that outlines the question or analysis posed as well as two sub-folders, one labeled R, and the other python that comprises the sample notebooks using the R and Python programming languages, respectively.

R and Python were chosen since they are popular languages among scientists to explore and visualize data. Jupyter Notebook paired with Google Colab is used because of its interactive code and data exploration features, effectiveness in teaching, language independency, and ease of sharing code.

A challenging aspect that has been highlighted with this process is accessing the (meta)data in a user-friendly way via the NMDC API. Because the NMDC metadata schema is highly modular, retrieving metadata is not straight forward without extensive knowledge of the metadata schema's infrastructure, modeling language (LinkML), and naming conventions. A proposed solution to this challenge is the creation of an R or Python package that would allow users to access NMDC's data in an easier and more straight forward way.

Contributing

We welcome contributions to this repository. Please see the Contributing document for more information on how to contribute.