sungdukyu / LEAP_REU_Dataset_Notebook

1 stars 1 forks source link

🚨Avoid high memory crashing kernels🚨 #13

Open jbusecke opened 1 year ago

jbusecke commented 1 year ago

I have heard from several folks that the memory in notebooks is crashing when trying to apply e.g. preprocessing steps.

I think I have found the (or at least one) reason this happens:

TLDR: If you read a dataset larger then memory in, you want to add chunks={} to your call to xr.open_dataset. Doing so loads the data into a dask array, and enables distributed calculations on larger-than-memory arrays!

What happens if you do not do this, is that the array is loaded with xarrays internal lazy loading (which however will attempt to load ALL the data into memory when you compute anything!). I have raised an issue to make this behavior better documented

cc @YuHuang3019 @sungdukyu

YuHuang3019 commented 1 year ago

Thank you @jbusecke ! And the xarray API apply_ufunc and map_blocks can be further applied for parallel running the functions over the chunks right?

jbusecke commented 1 year ago

In general, absolutely! But choosing the right approach here depends a lot on the specific analysis. Could you open new issues with examples for each case? Then we can figure out how to optimize this.

YuHuang3019 commented 1 year ago

Thank you Julius, I do not have a specific case to work with right now but I will definitely update it if I use these APIs later!

Best regards, Yu Huang Ph.D. candidate Columbia University Earth & Environmental Engineering @.***

On Thu, Jun 22, 2023 at 3:09 PM Julius Busecke @.***> wrote:

In general, absolutely! But choosing the right approach here depends a lot on the specific analysis. Could you open new issues with examples for each case? Then we can figure out how to optimize this.

— Reply to this email directly, view it on GitHub https://github.com/sungdukyu/LEAP_REU_Dataset_Notebook/issues/13#issuecomment-1603181551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHS242CRNW3PWRQMJ22YHYTXMSJ4ZANCNFSM6AAAAAAZFNLJIQ . You are receiving this because you were mentioned.Message ID: @.***>