uwhackweek / schedule-2024

Combined Hackweek Schedule for 2024 Event
MIT License
0 stars 0 forks source link

Plenary: Cloud computing and cloud-optimized data formats #35

Open JessicaS11 opened 5 months ago

JessicaS11 commented 5 months ago

Lead: Aimee Barciauskas Date: 19/08/2024 Start Time: 1300 Duration: 45 Description:

Details

### Learning Outcomes * outcome 1 * outcome 2 * outcome 3 ### People Developing the Tutorial (content creation, helpers, teachers) ### Summary Description * Why we should care about cloud-optimized formats (now)? * What does it mean to be cloud-optimized? * Cloud formats and cloud computing * Demo of ICESat-2 in Parquet format using lonboard ### Dependencies (things people should know in advance of the tutorial) ### Technical Needs (GPUs? Large file storage? Unique libraries?)

abarciauskas-bgse commented 2 months ago

Outline

Preamble

We shouldn't have to think about formats so this tutorial is hopefully be obsolete in the next 5 years. But we have a long ways to go so we want to share with you what cloud optimized means and why you should care so you can help us get there.

  1. Why should you care - If you have any science that may requires multiple files and may be memory intensive when dealing with multiple files in-memory. If things are slow in accessing those files, you should know what to look for in explaining why and perhaps advocating for things to be better!
  2. What does it mean to be cloud-optimized
  3. Explain cloud optimized vs cloud native and go through formats you may see ICESat-2 products in
    1. HDF5 and cloud-optimized hdf5
    2. zarr (cloud native multi dimension, applies to higher level products)
    3. geoparquet
  4. Brief introduction to cloud computing: You are already using cloud computing on the CryoCloud! CryoCloud is providing in-region "collocated" compute. Other frameworks for using cloud optimized formats and parallel computing are dask/coiled and cubed. Many other serverless frameworks for parallel computing.
  5. Demo: Use sliderule to output in parquet demonstrate icesat-2 in geoparquet with lonboard

3 learning outcomes

scottyhq commented 2 months ago

use sliderule to output in parquet demonstrate icesat-2 in geoparquet with lonboard

I'll help out to make sure this notebook from last year can work with lonboard on CryoCloud JupyterHub https://icesat-2-2023.hackweek.io/tutorials/sliderule/parquet-s3.html

abarciauskas-bgse commented 2 months ago

As a point of comparison, my colleague Sean went through the process of creating parquet without sliderule and it is much more complicated: https://github.com/developmentseed/icesat-parquet/blob/main/atl08_earthaccess.ipynb. May be worth making that point so participants are motivated to use sliderule.