pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

PermissionError: [Errno 13] Permission denied #6805

Closed lassiterdc closed 2 years ago

lassiterdc commented 2 years ago

What is your issue?

This was raised about a year ago but still seems to be unresolved, so I'm hoping this will bring attention back to the issue. (https://github.com/pydata/xarray/issues/5488)

Data: dropbox sharing link Description: This folder contains 2 files each containing 1 day's worth of 1kmx1km gridded precipitation rate data from the National Severe Storms Laboratory. Each is about a gig (sorry they're so big, but it's what I'm working with!) Code:

import xarray as xr

f_in_ncs = "data/"
f_in_nc = "data/20190520.nc"

#%% works
ds = xr.open_dataset(f_in_nc, 
                    chunks={'outlat':3500, 'outlon':7000, 'time':50})
#%% doesn't work
mf_ds = xr.open_mfdataset(f_in_ncs,  concat_dim = "time",
            chunks={'outlat':3500, 'outlon':7000, 'time':50},
            combine = "nested", engine = 'netcdf4')

Error:

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?1f03b506-1f93-46ca-ad53-ff5a1ca1a767)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\file_manager.py:199, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    [198](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=197) try:
--> [199](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=198)     file = self._cache[self._key]
    [200](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/file_manager.py?line=199) except KeyError:

File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\lru_cache.py:53, in LRUCache.__getitem__(self, key)
     [52](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=51) with self._lock:
---> [53](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=52)     value = self._cache[key]
     [54](file:///c%3A/Users/Daniel/anaconda3/envs/mrms/lib/site-packages/xarray/backends/lru_cache.py?line=53)     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('d:\\mrms_processing\\_reprex\\2022-7-18_open_mfdataset\\data',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

PermissionError                           Traceback (most recent call last)
Input In [4], in <cell line: 5>()
      1 import xarray as xr
      3 f_in_ncs = "data/"
----> 5 ds = xr.open_mfdataset(f_in_ncs,  concat_dim = "time",
      6             chunks={'outlat':3500, 'outlon':7000, 'time':50},
      7             combine = "nested", engine = 'netcdf4')

File c:\Users\Daniel\anaconda3\envs\mrms\lib\site-packages\xarray\backends\api.py:908, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
...
File src\netCDF4\_netCDF4.pyx:2307, in netCDF4._netCDF4.Dataset.__init__()

File src\netCDF4\_netCDF4.pyx:1925, in netCDF4._netCDF4._ensure_nc_success()

PermissionError: [Errno 13] Permission denied: b'd:\\mrms_processing\\_reprex\\2022-7-18_open_mfdataset\\data'
dcherian commented 2 years ago

Can you try opening it directly with netcdf4.Dataset please? If that does not work, then please raise an issue with that package.

lassiterdc commented 2 years ago

I just edited the original post to show that xr.open_dataset works. I hope that is adequate to show that the issue is with xarray and not netcdf4.Dataset.

andersy005 commented 2 years ago

@lassiterdc, in your example above, f_in_ncs is just a string and you are passing this string to xr.open_mfdataset() which i don't think knows what to do with it.

f_in_ncs = "data/"

have you tried retrieving the list of all files under "data/" via the glob module?

import glob 
f_in_ncs = sorted(glob.glob("data/*.nc"))

mf_ds = xr.open_mfdataset(f_in_ncs,  concat_dim = "time",
            chunks={'outlat':3500, 'outlon':7000, 'time':50},
            combine = "nested", engine = 'netcdf4')
dcherian commented 2 years ago

Ah you should be able to pass "data/*" too directly to open_mfdataset

lassiterdc commented 2 years ago

I can't believe I forgot the asterisk!!! Thank you for catching that.