pangeo-data / jupyter-earth

Jupyter meets the Earth: combining research use cases in geosciences with technical developments within the Jupyter and Pangeo ecosystems.
https://jupytearth.org
Creative Commons Zero v1.0 Universal
28 stars 6 forks source link

"To database or not to database" #48

Open whyjz opened 3 years ago

whyjz commented 3 years ago

@fperez mentioned this at today's project meeting -- this thought came from the Cryosphere working group meeting several weeks ago, and I feel that it might worth sharpening the question itself and would like to get some ideas from you.

Suppose a research group is making a data set. If they want to open their data for other people, they have to choose a certain data structure first. People usually choose whatever they think the best to structure and share their data, but this is not necessarily the best way for using/analyzing the data from a user's perspective. As a researcher, how do I know the best way to structure the data so that other people can explore them with maximum efficiency?

I know many research agencies (NSIDC, NASA, USGS, ...) do a lot of end-user surveys for such information, and you often have multiple ways to get their data. However, doing this might be hard for a single research lab or a small working group. I also feel that many geoscience people, including me, are unfamiliar with various data structures and database management systems. When we have to make such a decision, we choose whatever we know the best. For example, I worked with Landsat 8 GeoTIFF data, and for most of the derived data sets I generated, I used GeoTIFF and did not care whether it is the most convenient way to share them.

Any thoughts are welcome!