ocean-transport / argo-intern

Andrew's project
1 stars 1 forks source link

ISSUE: argopy data fetching issues #7

Closed andrewfagerheim closed 2 years ago

andrewfagerheim commented 2 years ago

I'm trying to load Argo profiles by longitude and latitude using argopy, but I'm running into errors such as internal server error ClientResponseError: 500 when running the following code:

from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher()
ds = argo_loader.region([30,40,-55,-50,0,2000]).to_xarray()

I've found it's possible to work around this problem by starting with much smaller boundaries (say [30,35,-55,-50,0,200]) and incrementally increasing them until I have the dataset I'm looking for, but this seems like a very inefficient solution and is time consuming. Is there a better way to load in data using argopy?

dhruvbalwada commented 2 years ago

Would be a good idea to open this issue on argopy: https://github.com/euroargodev/argopy/issues (before opening issue, actually look through the issues that are already there (both open and close) to see if anything resembles your problem. @andrewfagerheim

dhruvbalwada commented 2 years ago

also look through this: https://github.com/euroargodev/argopy/issues/227

dhruvbalwada commented 2 years ago

Also, I think it might be a good idea to just download all the data. Look through how to setup your local ftp server for Argo data here: https://argopy.readthedocs.io/en/latest/data_sources.html. We can chat abot how to do this on gyre or abyssal next week, if it seems too complicated to do on your own.

andrewfagerheim commented 2 years ago

As recommended in https://github.com/euroargodev/argopy/issues/227, I tried to run the following code, which now includes a parallel loader:

from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher(parallel=True)
ds = argo_loader.region([30,40,-55,-50,0,2000]).to_xarray()

This only gives a different error, it returns ValueError: Errors happened with all URLs, this could be due to an internal impossibility to read returned content.

dhruvbalwada commented 2 years ago

I downloaded all the Argo data and put it here: /swot/SUM05/dbalwada/202206-ArgoData. I think the local ftp way of accessing data should allow you to use argopy and access this. Try it and lets see if that works out.

jbusecke commented 2 years ago

It might have been a temporary issue since the 500 code indicates an internal server error, but that can be a variety of issues.

andrewfagerheim commented 2 years ago

I was successfully able to save the southern ocean box [30,40,-55,-50,0,2000] using the data @dhruvbalwada uploaded to gyre. However, when I tried two other regions ([-155,-145,35,40,0,2000] and [-35,-25,40,45,0,2000]) it returned the following error. Thoughts on how to work around this?

KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1394, in Dataset._construct_dataarray(self, name)
   1393 try:
-> 1394     variable = self._variables[name]
   1395 except KeyError:

KeyError: 'PROFILE_PSAL_QC'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:514, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    513 try:
--> 514     vars = ensure_common_dims([ds[k].variable for ds in datasets])
    515 except KeyError:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:514, in <listcomp>(.0)
    513 try:
--> 514     vars = ensure_common_dims([ds[k].variable for ds in datasets])
    515 except KeyError:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1498, in Dataset.__getitem__(self, key)
   1497 if hashable(key):
-> 1498     return self._construct_dataarray(key)
   1499 else:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1396, in Dataset._construct_dataarray(self, name)
   1395 except KeyError:
-> 1396     _, name, variable = _get_virtual_variable(
   1397         self._variables, name, self._level_coords, self.dims
   1398     )
   1400 needed_dims = set(variable.dims)

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:169, in _get_virtual_variable(variables, key, level_vars, dim_sizes)
    168 else:
--> 169     ref_var = variables[ref_name]
    171 if var_name is None:

KeyError: 'PROFILE_PSAL_QC'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 ds = argo_loader.region(box).to_xarray()

File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/fetchers.py:426, in ArgoDataFetcher.to_xarray(self, **kwargs)
    421 if not self.fetcher:
    422     raise InvalidFetcher(
    423         " Initialize an access point (%s) first."
    424         % ",".join(self.Fetchers.keys())
    425     )
--> 426 xds = self.fetcher.to_xarray(**kwargs)
    427 xds = self.postproccessor(xds)
    429 # data_path = self.fetcher.cname() + self._mode + ".zarr"
    430 # log.debug(data_path)
    431 # if self.cache and self.fs.exists(data_path):
   (...)
    435 #     xds = self.postproccessor(xds)
    436 #     xds = self._write(data_path, xds)._read(data_path)

File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/data_fetchers/gdacftp_data.py:338, in FTPArgoDataFetcher.to_xarray(self, errors)
    335     raise DataNotFound("No data found for: %s" % self.indexfs.cname)
    337 # Download data:
--> 338 ds = self.fs.open_mfdataset(
    339     self.uri,
    340     method=self.method,
    341     concat_dim="N_POINTS",
    342     concat=True,
    343     preprocess=self._preprocess_multiprof,
    344     progress=self.progress,
    345     errors=errors,
    346     decode_cf=1,
    347     use_cftime=0,
    348     mask_and_scale=1,
    349 )
    351 # Data post-processing:
    352 ds["N_POINTS"] = np.arange(
    353     0, len(ds["N_POINTS"])
    354 )  # Re-index to avoid duplicate values

File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/stores/filesystems.py:376, in filestore.open_mfdataset(self, urls, concat_dim, max_workers, method, progress, concat, preprocess, errors, *args, **kwargs)
    373 if len(results) > 0:
    374     if concat:
    375         # ds = xr.concat(results, dim=concat_dim, data_vars='all', coords='all', compat='override')
--> 376         ds = xr.concat(results, dim=concat_dim, data_vars='minimal', coords='minimal', compat='override')
    377         return ds
    378     else:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:238, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    233 else:
    234     raise TypeError(
    235         "can only concatenate xarray Dataset and DataArray "
    236         f"objects, got {type(first_obj)}"
    237     )
--> 238 return f(
    239     objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs
    240 )

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:516, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    514     vars = ensure_common_dims([ds[k].variable for ds in datasets])
    515 except KeyError:
--> 516     raise ValueError(f"{k!r} is not present in all datasets.")
    517 combined = concat_vars(vars, dim, positions, combine_attrs=combine_attrs)
    518 assert isinstance(combined, Variable)

ValueError: 'PROFILE_PSAL_QC' is not present in all datasets.
andrewfagerheim commented 2 years ago

For what it's worth, I tried a box in the Indian Ocean, [90,100,-15,-10,0,2000]. This also worked fine and I was able to download the netcdf file.

dhruvbalwada commented 2 years ago

That is great. I think we can close this issue in that case. Our solution was to download the argo data and use argopy's local dac capability to access the data. This took away the errors that crop us due to remote server limits, while preserving all the benefits of argopy.