oceanmodeling / searvey

Sea state observational data retrieval
https://searvey.readthedocs.io/en/stable/
GNU General Public License v3.0
22 stars 11 forks source link

COOPS station issue with `"ofs"` type #159

Open SorooshMani-NOAA opened 1 month ago

SorooshMani-NOAA commented 1 month ago

After a recent server down time on COOPS, I noticed now that getting the list of stations fail due to an issue in ofs station type:

File ~/workarea/sandbox/searvey/searvey/coops.py:963, in <listcomp>(.0)
    955 @lru_cache(maxsize=1)
    956 def _get_coops_stations() -> geopandas.GeoDataFrame:
    957     """
    958     Return COOPS station metadata from: COOPS main API
    959
    960     :return: ``geopandas.GeoDataFrame`` with the station metadata
    961     """
--> 963     results = [_get_single_coops_station(station_type=st_ty) for st_ty in StationTypes]
    964 #    results = multifutures.multiprocess(
    965 #        _get_single_coops_station, func_kwargs=[{"station_type": st_ty} for st_ty in StationTypes]
    966 #    )
    968     df_all = pandas.concat(r.result for r in results)

File ~/workarea/sandbox/searvey/searvey/coops.py:949, in _get_single_coops_station(station_type)
    946 url = f"https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations.json?expand=details&type={station_type}"
    948 df_thistype = pandas.read_json(url)
--> 949 df_thistype = pandas.json_normalize(df_thistype["stations"])
    950 df_thistype["station_type"] = station_type
    952 return df_thistype

File ~/miniconda3/envs/stormevents/lib/python3.10/site-packages/pandas/core/frame.py:4102, in DataFrame.__getitem__(self, key)
   4100 if self.columns.nlevels > 1:
   4101     return self._getitem_multilevel(key)
-> 4102 indexer = self.columns.get_loc(key)
   4103 if is_integer(indexer):
   4104     indexer = [indexer]

File ~/miniconda3/envs/stormevents/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'stations'
SorooshMani-NOAA commented 2 weeks ago

@pmav99 @tomsail @saeed-moghimi-noaa do you think we should include "ofs" stations in the searvey station list? OFS is model results, not actual observations, but COOPS does provide it through its API. What do you think?

See https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations?type=ofs

saeed-moghimi-noaa commented 2 weeks ago

@SorooshMani-NOAA

I am not sure. We have similar stations in STOFS. To my understanding these are more for user support where they do not have any information.

SorooshMani-NOAA commented 2 weeks ago

Another problem with OFS is that they have a different set of metadata, e.g. they don't have any IDs or station names or state info, etc:

(Pdb) p df[df.station_type == 'ofs'].dropna(axis=1)
           lat       lng station_type stationID                  OFSStationName OFSCode subdomain currents salinity waterlevel waterTemp   wind PORTSCode virtual
0    44.656700 -67.21000          ofs   8411060             Cutler Farris Wharf  GOMOFS         0    False     True       True      True   True         0   False
1    44.287000 -67.30700          ofs     44027           20 NM SE of Jonesport  GOMOFS         0     True     True       True      True   True         0    True
2    43.484000 -67.88300          ofs     44037                    Jordan Basin  GOMOFS         0     True     True       True      True   True         0    True
3    44.110000 -68.11000          ofs     44034             Eastern Maine Shelf  GOMOFS         0     True     True       True      True   True         0    True
4    44.392193 -68.20428          ofs   8413320                      Bar Harbor  GOMOFS         0    False     True       True      True   True         0   False
..         ...       ...          ...       ...                             ...     ...       ...      ...      ...        ...       ...    ...       ...     ...
659  27.818611 -97.20895          ofs   8775283             Enbridge, Ingleside   NGOFS        co     True     True       True      True   True        cc   False
660  27.792390 -96.95653          ofs    cc0101                         AP Buoy   NGOFS        co     True     True       True      True   True        cc   False
661  27.826080 -97.01981          ofs    cc0201                Aransas Pass LB6   NGOFS        co     True     True       True      True   True        cc   False
662  27.839810 -97.05314          ofs    cc0301      Port Aransas, Channel View   NGOFS        co     True     True       True      True   True        cc   False
663  27.839720 -97.07250          ofs    cc0601  UTMSI Fisheries and Marine Lab   NGOFS        co     True    False      False     False  False        cc   False

vs for example water level stations

(Pdb) p df[df.station_type == 'waterlevels'].dropna(axis=1)
     tidal greatlakes shefcode  ...                                   disclaimers.self                                       notices.self  station_type
0     True      False    NWWH1  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
1     True      False    OOUH1  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
2     True      False    PRHH1  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
3     True      False    MOKH1  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
4     True      False    KLIH1  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
..     ...        ...      ...  ...                                                ...                                                ...           ...
296  False      False           ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
297   True      False    MGIP4  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
298   True      False    MGZP4  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
299  False      False    AUDP4  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels
300   True      False    MISP4  ...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...  https://api.tidesandcurrents.noaa.gov/mdapi/pr...   waterlevels

[301 rows x 41 columns]
(Pdb) p df[df.station_type == 'waterlevels'].dropna(axis=1).columns
Index(['tidal', 'greatlakes', 'shefcode', 'state', 'timezone', 'timezonecorr',
       'observedst', 'stormsurge', 'forecast', 'outlook', 'HTFhistorical',
       'nonNavigational', 'id', 'name', 'lat', 'lng', 'affiliations', 'self',
       'expand', 'tideType', 'details.id', 'details.established',
       'details.removed', 'details.noaachart', 'details.timemeridian',
       'details.timezone', 'details.origyear', 'details.self', 'sensors.self',
       'floodlevels.self', 'datums.self', 'supersededdatums.self',
       'harmonicConstituents.self', 'benchmarks.self', 'tidePredOffsets.self',
       'ofsMapOffsets.self', 'nearby.self', 'products.self',
       'disclaimers.self', 'notices.self', 'station_type'],
      dtype='object')

and we'd be discarding a lot of it if we "normalize":

(Pdb) p df[df.station_type == 'ofs']
        nws_id station_type  name state        lon  lon        lat removed  status
nos_id
ofs_0     <NA>          ofs  <NA>  <NA> -67.209999  NaN  44.656700     NaT  active
ofs_1     <NA>          ofs  <NA>  <NA> -67.306999  NaN  44.286999     NaT  active
ofs_2     <NA>          ofs  <NA>  <NA> -67.883003  NaN  43.484001     NaT  active
ofs_3     <NA>          ofs  <NA>  <NA> -68.110001  NaN  44.110001     NaT  active
ofs_4     <NA>          ofs  <NA>  <NA> -68.204277  NaN  44.392193     NaT  active
...        ...          ...   ...   ...        ...  ...        ...     ...     ...
ofs_659   <NA>          ofs  <NA>  <NA> -97.208946  NaN  27.818611     NaT  active
ofs_660   <NA>          ofs  <NA>  <NA> -96.956528  NaN  27.792391     NaT  active
ofs_661   <NA>          ofs  <NA>  <NA> -97.019814  NaN  27.826080     NaT  active
ofs_662   <NA>          ofs  <NA>  <NA> -97.053139  NaN  27.839809     NaT  active
ofs_663   <NA>          ofs  <NA>  <NA> -97.072502  NaN  27.839720     NaT  active
pmav99 commented 2 weeks ago

would it make sense to have them as a separate provider?

SorooshMani-NOAA commented 2 weeks ago

@pmav99 I can see benefit in having it, but keeping it separate (e.g. as coops_ofs?!). But I also want to avoid confusion. So, for now I'll make it work along with the other stations, but let's keep this open to discuss your idea next week.