nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
418 stars 84 forks source link

Confusing error when submitting list instead of tuple for `temporal` parameter #804

Open mfisher87 opened 2 months ago

mfisher87 commented 2 months ago

Is this issue already tracked somewhere, or is this a new report?

Current Behavior

The error is confusing because it comes from python-cmr after earthaccess has done stuff with the parameters and passed them on. This message tells me I should pass a string instead of a list when it should be telling me to pass a tuple instead of a list.

>>> results = earthaccess.search_data(short_name="GLDAS_NOAH025_3H", temporal=["2020-01-01", "2020-01-07"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/robatt/Projects/nsidc/earthaccess/earthaccess/api.py", line 125, in search_data
    query = DataGranules().parameters(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/robatt/Projects/nsidc/earthaccess/earthaccess/search.py", line 559, in parameters
    methods[key](val)
  File "/home/robatt/Projects/nsidc/earthaccess/earthaccess/search.py", line 844, in temporal
    return super().temporal(date_from, date_to, exclude_boundary)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/robatt/.local/share/miniforge3/envs/earthaccess/lib/python3.12/site-packages/cmr/queries.py", line 447, in temporal
    date_from, date_to = self._format_date(date_from, date_to)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/robatt/.local/share/miniforge3/envs/earthaccess/lib/python3.12/site-packages/cmr/queries.py", line 387, in _format_date
    date_from = convert_to_string(date_from, datetime(1, 1, 1, 0, 0, 0, tzinfo=timezone.utc))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/robatt/.local/share/miniforge3/envs/earthaccess/lib/python3.12/site-packages/cmr/queries.py", line 374, in convert_to_string
    raise TypeError(
TypeError: Date must be a date object or ISO 8601 string, not list.

Expected Behavior

Error should reflect that the container is the wrong type, e.g. Temporal must be a tuple of date objects or ISO8601 strings, or earthaccess should handle the input in a way that gives python-cmr the correct container when the user makes a trivial mistake like this.

Steps To Reproduce

earthaccess.search_data(short_name="GLDAS_NOAH025_3H", temporal=["2020-01-01", "2020-01-07"])

Environment

- OS: Pop!_OS 22
- Python: 3.12

Additional Context

No response

itcarroll commented 2 months ago

Our implementation of the temporal filter does nothing other than override typing and docstring of the python_cmr method.

Is the best resolution for this issue to add a type check on the argument before calling super()?

https://github.com/nsidc/earthaccess/blob/699cc4e0ac4239926ad04b4be2fd72c95aba21bb/earthaccess/search.py#L767-L797

Good to bear in mind #488, and (I don't see an issue for this) that we could easily support multiple temporal ranges because python_cmr already supports it.

mfisher87 commented 2 months ago

Our implementation of the temporal filter does nothing other than override typing and docstring of the python_cmr method.

We also splat the arg's container IFF the container is a tuple:

https://github.com/nsidc/earthaccess/blob/699cc4e0ac4239926ad04b4be2fd72c95aba21bb/earthaccess/search.py#L508-L512

Is the best resolution for this issue to add a type check on the argument before calling super()?

Yeah, I think so! At least maybe for now.

But also, I feel like the "splat the arg only if it's a tuple" behavior is a bit confusing, or at the wrong level of abstraction. I feel like we should never mess with the args at the parameters method, instead passing them whole to the handler method so that when we ask "what are we doing with the temporal argument before handing it off to python-cmr?" we can find all of the things being done to it in one place!

If we add this handling to the temporal method, we're just adding compensation for the behavior at the parameters method by checking if we've received a Union[str, dt.date, dt.datetime], in which case the tuple-breaking code worked. I'd rather be able to check in the temporal method if we got a sequence of two things and break those up, and if not provide a better error message than python-cmr, since our API is different than python-cmr's. Maybe "Received a {Type}, but expected a sequence of two items". Perhaps it would be better to do this refactoring in the course of addressing this issue. :thinking:

What do you think, Ian?

JessicaS11 commented 3 weeks ago

Just picking up on this thread as I explore how we could replace icepyx.Query with earthaccess.search (https://github.com/icesat2py/icepyx/issues/575). One of the things I'm finding (here temporal is the relevant input, but it's also the case for spatial) is that icepyx is a lot more flexible and forgiving (and broad) in terms of the inputs it accepts. For instance, users can input not only dates but times (if they want), and icepyx will either use defaults or pass those days/times through all functions. To do this, we essentially have a temporal module that takes the user input (tuple or list, of strings, date, or datetime objects) with optional start and end time kwargs (if they're not already attached to an input datetime object) and makes sure the user inputs can all be turned into valid datetime objects. Then we attach these internally-defined temporal objects as private variables to our query object, and use that temporal module to manipulate our "validated" datetimes for whereever we need to pass it next (like CMR).

The motivation for this (and especially for the analogous spatial version; the temporal module is much simpler!) was that we could turn multiple types of user input into a single "desired" (and validated) object, and then depending on what the user asks for later we have a "known" version of it we can easily manipulate (e.g. to pass in whatever format the API we're calling needs - like spatial.fmt_for_CMR).

I bring this up here because I've been grappling with some questions that I think are relevant to this conversation (I'll limit it to temporal ones for thread relevance):