tsutterley / pyTMD

Python-based tidal prediction software
https://pytmd.readthedocs.io
MIT License
135 stars 40 forks source link

Issues modelling tides with JSON definition files #318

Closed robbibt closed 3 months ago

robbibt commented 3 months ago

Hey @tsutterley, I've been trying to test out the new GOT5.5/5.6 using the JSON definition files included in tests. My directory looks like this:

image

I can run tide modelling for other models (e.g. FES2012) from a .def format definition file pretty easily, e.g.:

from pyTMD.compute import tide_elevations
import pandas as pd
import numpy as np

out = tide_elevations(
    x=[122.14],
    y=[-17.9],
    delta_time=pd.date_range("2020", "2021", periods=2),
    DIRECTORY="/gdata1/data/tide_models/",
    MODEL="FES2012",
    DEFINITION_FILE="/gdata1/data/tide_models/model_FES2012.def",
    DEFINITION_FORMAT="ascii",
    EPSG=4326,
    TIME="datetime",
    EXTRAPOLATE=True,
    CUTOFF=np.inf,
)

However, if I try to do something similar with a JSON fromat definition, I get an TypeError: intern() argument must be str, not list error:

out = tide_elevations(
    x=[122.14],
    y=[-17.9],
    delta_time=pd.date_range("2020", "2021", periods=2),
    DIRECTORY="/gdata1/data/tide_models/",
    MODEL="GOT5.6",
    DEFINITION_FILE="/gdata1/data/tide_models/model_GOT5.6.json",
    DEFINITION_FORMAT="json",
    EPSG=4326,
    TIME="datetime",
    EXTRAPOLATE=True,
    CUTOFF=np.inf,
)

Is there anything I'm doing obviously wrong here? I wasn't exactly sure what additional params need to be provided when specifying JSON definition files, beyond DEFINITION_FORMAT="json"...

Full error --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[23], line 40 18 import numpy as np 20 # out = tide_elevations( 21 # x=[122.14], 22 # y=[-17.9], (...) 36 37 # out ---> 40 out = tide_elevations( 41 x=[122.14], 42 y=[-17.9], 43 delta_time=pd.date_range("2020", "2021", periods=2), 44 DIRECTORY="/gdata1/data/tide_models/", 45 MODEL="GOT5.6", 46 DEFINITION_FILE="/gdata1/data/tide_models/model_GOT5.6.json", 47 DEFINITION_FORMAT="json", 48 EPSG=4326, 49 TIME="datetime", 50 EXTRAPOLATE=True, 51 CUTOFF=np.inf, 52 ) 54 out 56 # # from pyTMD.compute_tide_corrections import compute_tide_corrections 57 58 # # out = compute_tide_corrections( (...) 68 69 # # out File /env/lib/python3.10/site-packages/pyTMD/compute.py:299, in tide_elevations(x, y, delta_time, DIRECTORY, MODEL, ATLAS_FORMAT, GZIP, DEFINITION_FILE, DEFINITION_FORMAT, CROP, BOUNDS, EPSG, EPOCH, TYPE, TIME, METHOD, EXTRAPOLATE, CUTOFF, INFER_MINOR, APPLY_FLEXURE, FILL_VALUE, **kwargs) 297 # get parameters for tide model 298 if DEFINITION_FILE is not None: --> 299 model = pyTMD.io.model(DIRECTORY).from_file(DEFINITION_FILE, 300 format=DEFINITION_FORMAT) 301 else: 302 model = pyTMD.io.model(DIRECTORY, format=ATLAS_FORMAT, 303 compressed=GZIP).elevation(MODEL) File /env/lib/python3.10/site-packages/pyTMD/io/model.py:1408, in model.from_file(self, definition_file, format) 1406 self.from_ascii(fid) 1407 elif (format.lower() == 'json'): -> 1408 self.from_json(fid) 1409 # close the definition file 1410 fid.close() File /env/lib/python3.10/site-packages/pyTMD/io/model.py:1734, in model.from_json(self, fid) 1730 elif (temp.type == 'z') and (temp.directory is not None): 1731 # use glob strings to find files in directory 1732 glob_string = copy.copy(temp.model_file) -> 1734 temp.model_file = list(temp.directory.glob(glob_string)) 1735 # attempt to extract model directory 1736 try: File /env/lib/python3.10/pathlib.py:1030, in Path.glob(self, pattern) 1028 if not pattern: 1029 raise ValueError("Unacceptable pattern: {!r}".format(pattern)) -> 1030 drv, root, pattern_parts = self._flavour.parse_parts((pattern,)) 1031 if drv or root: 1032 raise NotImplementedError("Non-relative patterns are unsupported") File /env/lib/python3.10/pathlib.py:74, in _Flavour.parse_parts(self, parts) 72 else: 73 if rel and rel != '.': ---> 74 parsed.append(sys.intern(rel)) 75 if drv or root: 76 if not drv: 77 # If no drive is present, try to find one in the previous 78 # parts. This makes the result of parsing e.g. 79 # ("C:", "/", "a") reasonably intuitive. TypeError: intern() argument must be str, not list
robbibt commented 3 months ago

I guess on a similar/related note - does this look correct for loading JSON format definitions directly from a dictionary, e.g. passing the bytes_io object directly to DEFINITION_FILE?

# Example dictionary
data = {"format": "GOT-netcdf", "name": "GOT5.6", "model_file": ["GOT5.5/ocean_tides/2n2.nc", "GOT5.5/ocean_tides/j1.nc", "GOT5.5/ocean_tides/k1.nc", "GOT5.5/ocean_tides/k2.nc", "GOT5.6/ocean_tides/l2.nc", "GOT5.6/ocean_tides/m1.nc", "GOT5.5/ocean_tides/m2.nc", "GOT5.6/ocean_tides/m3.nc", "GOT5.5/ocean_tides/m4.nc", "GOT5.5/ocean_tides/ms4.nc", "GOT5.5/ocean_tides/mu2.nc", "GOT5.6/ocean_tides/n2.nc", "GOT5.5/ocean_tides/o1.nc", "GOT5.5/ocean_tides/oo1.nc", "GOT5.5/ocean_tides/p1.nc", "GOT5.5/ocean_tides/q1.nc", "GOT5.5/ocean_tides/s1.nc", "GOT5.5/ocean_tides/s2.nc", "GOT5.5/ocean_tides/sig1.nc"], "type": "z", "variable": "tide_ocean", "version": "5.6", "scale": 0.01, "compressed": False, "reference": "https://doi.org/10.1126/sciadv.abd4744"}

# Convert dictionary to BytesIO
bytes_io = io.BytesIO(json.dumps(data).encode('utf-8'))

out = tide_elevations(
    x=[122.14],
    y=[-17.9],
    delta_time=pd.date_range("2020", "2021", periods=2),
    DIRECTORY="/gdata1/data/tide_models/",
    MODEL="GOT5.6",
    DEFINITION_FILE=bytes_io,
    DEFINITION_FORMAT="json",
    EPSG=4326,
    TIME="datetime",
    EXTRAPOLATE=True,
    CUTOFF=np.inf,
)
tsutterley commented 3 months ago

Got it. Having the DIRECTORY argument triggers the glob functionality, so it is trying to search the directory for files. Right now that functionality only works with a single pattern (a string). If you drop the DIRECTORY argument and append /gdata1/data/tide_models/ to the start of each model_file in the definition file it should work. I will update the glob functionality to enable iterating on lists. That will also allow searching for constituent files in multiple directories.

robbibt commented 3 months ago

Ah, makes sense - thanks!

One other slight usability improvement could also be auto-guessing the definition file format from the extension/datatype... e.g. if it's a JSON file or bytes_io, it's likely (always?) going to be JSON format. Perhaps the default for DEFINITION_FORMAT could be None or "auto" instead of "ascii", which would use "ascii" under-the-hood for all file extensions other than JSON/bytes_io? (and still allow passing "ascii" and "json" manually too like the current functionality)

tsutterley commented 3 months ago

That's a great idea! Implemented in #319

robbibt commented 3 months ago

Awesome, can confirm that both of these fixes/features work for me! Will do a small validatation of the new GOT models and see how they perform at our tide gauges. 🙂