Open rabernat opened 1 month ago
@rabernat Thanks for sharing this issue.
The IndexError
seems related to variables with inconsistent dimensions. Some variables (e.g., sweep_mode
, sweep_number
) are scalars, while others (e.g., DBZH
, ZDR
) are multi-dimensional, which could be causing the issue with Dask chunking.
To focus on the multi-dimensional variables, you can try:
import xradar
import xarray as xr
import pooch
# download and open a NEXRAD2 file from S3
url = "https://noaa-nexrad-level2.s3.amazonaws.com/2024/09/01/FOP1/FOP120240901_000347_V06"
local_file = pooch.retrieve(url, known_hash=None)
ds = xr.open_dataset(local_file, group="sweep_0", engine="nexradlevel2")
# create a chunked version
dsc = ds.chunk()
for var in dsc.data_vars:
if len(dsc[var].dims) > 1:
print(var)
display(dsc[var].load())
@syedhamidali - I'm not sure I understand your reponse.
Loading this dataset works fine without Dask. When dask comes into the picture, we get an error. This seems like a bug in xradar. The workaround you proposed does not address the root cause.
Thanks for the detailed report @rabernat. I've reopened #180 as it wasn't fully resolved.
A deeper look will take some time. We will definitely look into this after ERAD 2024 where the majority of the xradar devs are currently.
Side note: @rabernat You might be interested in the short course we gave last Sunday where we acknowledged the great work of pangeo and project pythia.
Thanks also to @syedhamidali for taking care here.
@kmuehlbauer I wanted to mention that I ran the same code with other file types (Cfradial, Iris...), and they all experienced the same issue with Dask chunking.
Description
I have found a puzzling bug that only comes up in certain situations with Dask
What I Did
Possibly related to #180.
Experience tells me this has something to do with Dask task tokenization.