pangeo-data / ml-workflow-examples

Simple examples of data pipelines from xarray to ML training
Apache License 2.0
22 stars 10 forks source link

Create AWS-S3-Dask-XGBoost.ipynb #11

Closed zhonghua-zheng closed 4 years ago

zhonghua-zheng commented 4 years ago

Dear all,

Inspired by the availability of the CESM large ensemble on AWS S3, I developed this workflow to show how to use dask-xgboost to deal with the data from S3.

The example here shows how to predict
"TREFHTMX": Maximum reference height temperature over output period using features:
"PRECT": Total (convective and large-scale) precipitation rate (liq + ice)
"WSPDSRFAV": Horizontal total wind speed average at the surface
"TS": Surface temperature (radiative)
"TREFHT": Reference height temperature
Please feel free to let me know if you have any questions and suggestions!

Best, Zhonghua

djgagne commented 4 years ago

It looks really promising as a workflow example. Would it be possible for you to add a bit more text description of each step and explain the variables and task more in the notebook?

zhonghua-zheng commented 4 years ago

It looks really promising as a workflow example. Would it be possible for you to add a bit more text description of each step and explain the variables and task more in the notebook?

@djgagne Hi David, Thank you very much for your suggestions! I just improved my workflow with a detailed description!

rabernat commented 4 years ago

Thanks so much for your contribution here! I'm happy to merge it.

Going forward, we should think about how to make these examples runnable via binder.

zhonghua-zheng commented 4 years ago

Thanks so much for your contribution here! I'm happy to merge it.

Going forward, we should think about how to make these examples runnable via binder.

@rabernat Hi Ryan, thank you very much for your help! Yes, we should make this example runnable via binder.