jbusecke / xMIP

Analysis ready CMIP6 data in python the easy way with pangeo tools.
https://cmip6-preprocessing.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
197 stars 44 forks source link

Memory Error Trying to Produce Dataframe #338

Closed AbbySh closed 6 months ago

AbbySh commented 7 months ago

Got error

MemoryError: Unable to allocate 476. PiB for an array with shape (180, 360, 492, 180, 360, 180, 360) and data type float32

Code that should reproduce on leap jupyterhub:

import xarray as xr
import gcsfs
import numpy as np
fs = gcsfs.GCSFileSystem()

#paths
member_path = 'gs://leap-persistent/abbysh/pco2_testrun_canesm5/test_cmip6_testbed/CanESM5/member_r1i1p2f1/CanESM5.r1i1p2f1.Omon.gn.zarr'
xco2_path = 'gs://leap-persistent/abbysh/zarr_files_/xco2_cmip6_183501-224912.zarr'
chl_clim_path = 'gs://leap-persistent/abbysh/pco2_testrun_canesm5/test_cmip6_testbed/CanESM5/member_r1i1p2f1/chlclim_CanESM5.r1i1p2f1.Omon.gn.v20190429.zarr'
socat_path = 'gs://leap-persistent/abbysh/zarr_files_/socat_mask_02062024.zarr'

file_engine = 'zarr'

#open files
member_data = xr.open_mfdataset(member_path, engine=file_engine)
socat_mask_data = xr.open_mfdataset(socat_path, engine=file_engine)
tmp = xr.open_mfdataset(chl_clim_path, engine=file_engine).chl_clim

inputs = {}

#get variable data
time = member_data.time
inputs['socat_mask'] = socat_mask_data.mask
inputs['sss'] = member_data.sos
inputs['sst'] = member_data.tos
inputs['chl'] = member_data.chl
inputs['mld'] = member_data.mlotst
inputs['pCO2_DIC'] = member_data.pco2_nonT # non temperature component of pCO2 (what we will reconstruct)
inputs['pCO2'] = member_data.spco2 # Reconstruct pCO2-pCO2T (difference) # actual pco2
inputs['xco2'] = xr.open_mfdataset(xco2_path, engine=file_engine).xco2

# Create Chl Clim 1982-1997 and then 1998-2017 time varying CHL:
tmp2 = member_data.chl

chl_sat = np.empty(shape=(492,180,360))

for yr in range(1982,1998):
    chl_sat[(yr-1982)*12:(yr-1981)*12,:,:]=tmp

chl_sat[192:492,:,:]=tmp2[192:492,:,:]

chl2 = xr.Dataset({'chl_sat':(["time","ylat","xlon"],chl_sat.data)},
                coords={'time': (['time'],tmp2.time.data),
                'ylat': (['ylat'],tmp2.lat.data[:,0]),
                'xlon':(['xlon'],tmp2.lon.data[0,:])})

inputs['chl_sat'] = chl2.chl_sat

for i in inputs:        
    if i != 'xco2':
        time_len = len(time)
        inputs[i].assign_coords(time=time[0:time_len])

DS = xr.merge([inputs['sss'], inputs['sst'], inputs['mld'], inputs['chl'], inputs['pCO2_DIC'], inputs['pCO2'], inputs['socat_mask'],
               inputs['chl_sat']], compat='override', join='override')

df = DS.to_dataframe()
jbusecke commented 6 months ago

I do not see this directly related to xMIP. From the error message I would guess that this is a broadcasting issue. Do I remember correctly that this is related to https://github.com/leap-stc/CMIP6-pCO2-testbed? If that is true, could you move the issue to that repo to keep concerns separated? Thanks. Closing this for now.