Data discovery: JupyterLab extensions for data catalogs

lheagy commented 4 years ago

There are a number of common features that will be useful to provide to users through a JupyterLab extension. These may include:

a) Metadata for the dataset, including sizes, licenses, names, authors, and timestamps. b) Drag-and-droppable code snippets for ingesting and analyzing data in popular programming languages. c) Previews of the data, such as the first few rows of tabular datasets, or downsampled imagery for satellite data. d) Collaboration tools for annotating and commenting on datasets within a multi-user environment.

THREDDS, STAC, and Intake catalogs are test-cases from which we can build documentation & re-usable components so that this can replicated across other data catalogs and service in the earth sciences.

lheagy commented 3 years ago

@andersy005: with your work on intake, do you have any suggestions on an ideal place to start? Is this something you would be willing to help sketch out further?

andersy005 commented 3 years ago

@lheagy,

Here are some of the ideas I've been thinking about, and plan to work on in the near future:

1) intake-index: STAC has an index service residing at https://stacindex.org/. This service is used to catalogue the existing STAC tools and catalogs. I've been thinking of starting something similar to stacindex but for intake. My initial plan is to start with a minimal web component. Once this web component is functional, I'd start working on a JupyterLab extension exposing the same content in a JupyterLab session. The website and the labextension would share the backend services (database, API, etc...). The frontends would support the drag-and-droppable code snippets you suggested. One key issue I need help with is scoping out a minimum viable product for this idea. Your input/feedback is welcome

2) jupyterlab-zarr: a while ago I started working on a jupyterlab data viewer for zarr. Well, I never made substantial progress on this 😅. Looking back, the procrastination may have been a good thing, because it appears that there has been substantial progress in zarr.js -- Javascript implementation of Zarr -- in the last 6 months. So, it seems this may be a good time to pick this project up again :)

As I understand it, these two projects would cover points a), b) and c) you listed above.

pangeo-data / jupyter-earth

Data discovery: JupyterLab extensions for data catalogs #14