Open jbusecke opened 1 year ago
Thank you @jbusecke ! And the xarray API apply_ufunc
and map_blocks
can be further applied for parallel running the functions over the chunks right?
In general, absolutely! But choosing the right approach here depends a lot on the specific analysis. Could you open new issues with examples for each case? Then we can figure out how to optimize this.
Thank you Julius, I do not have a specific case to work with right now but I will definitely update it if I use these APIs later!
Best regards, Yu Huang Ph.D. candidate Columbia University Earth & Environmental Engineering @.***
On Thu, Jun 22, 2023 at 3:09 PM Julius Busecke @.***> wrote:
In general, absolutely! But choosing the right approach here depends a lot on the specific analysis. Could you open new issues with examples for each case? Then we can figure out how to optimize this.
— Reply to this email directly, view it on GitHub https://github.com/sungdukyu/LEAP_REU_Dataset_Notebook/issues/13#issuecomment-1603181551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHS242CRNW3PWRQMJ22YHYTXMSJ4ZANCNFSM6AAAAAAZFNLJIQ . You are receiving this because you were mentioned.Message ID: @.***>
I have heard from several folks that the memory in notebooks is crashing when trying to apply e.g. preprocessing steps.
I think I have found the (or at least one) reason this happens:
TLDR: If you read a dataset larger then memory in, you want to add chunks={} to your call to xr.open_dataset. Doing so loads the data into a dask array, and enables distributed calculations on larger-than-memory arrays!
What happens if you do not do this, is that the array is loaded with xarrays internal lazy loading (which however will attempt to load ALL the data into memory when you compute anything!). I have raised an issue to make this behavior better documented
cc @YuHuang3019 @sungdukyu