Closed chuckwondo closed 3 months ago
Should I be checking to see if the parameter url is valid in these functions? Reading through the documentation and just wanted to get some clarification
The behavior of search with respect to each parameter can be modified using the options parameter. The options parameter takes the following form:
options[parameter][option_key]=value
where parameter is the **URL parameter** whose behavior is to be affected, value is either true or false, and option_key is one of the following
Should I be checking to see if the parameter url is valid in these functions? Reading through the documentation and just wanted to get some clarification
Excellent question. For the initial implementation, I suggest not performing such a check.
If the user specifies something that's invalid, they'll get a query response error indicating the problem.
The purpose of providing some client-side validation is to avoid an unnecessary request, but it's not a terrible thing to make such a request. On the contrary, one could argue that such client-side validation can become out of sync with what the server allows, thus potentially preventing the user from performing an action that is actually valid.
Long story short: skip client-side validation. We can always open a new issue to add such validation at a later point.
Thank you for the quick response! I will work on implementing this first version with no validation checking this evening
I had another question. Should I make sure the user explicitly adds a parameter before they set an option for said parameter? Functionally I think either approach would work, but I wanted to clarify
I had another question. Should I make sure the user explicitly adds a parameter before they set an option for said parameter? Functionally I think either approach would work, but I wanted to clarify
Another good question. Since setting an option for a parameter without setting the parameter itself is a valid CMR query, I'd say we can ignore this for now. Again, we can open a new issue to add that as an enhancement, if someone wants such client-side validation.
I would equate that to gmail's feature that blocks you from sending an email containing the word "attachment" when there is nothing attached. In that case, you perhaps made a mistake by forgetting to attach something. Similarly, here that would catch a user's "accidental" failure to set a parameter when an option for that parameter is set.
As a note for the future, this is likely something we would want to implement in the _valid_state
method. That would allow the user to specify an option before specifying a value for the associated parameter. If the user forgot to set the parameter, an exception would be raised when the URL is built internally because it would then trigger a call to _valid_state
, which would be where we would check that all parameters for which options were specified also have a parameter value specified.
I'm pretty new to contributing so I have another validation question haha. I am unsure if I should be raising errors if an invalid parameter is passed. My first thought is to do something similar to the parameters function.
methods = dict(getmembers(self, predicate=ismethod))
for key, val in kwargs.items():
# verify the key matches one of our methods
if key not in methods:
raise ValueError(f"Unknown key {key}")
def option_or(self, parameter: str, value: bool = True) -> Self:
"""
Set this Query's or option for the specified parameter
:param parameter: search parameter to specify or option for
:param value: value for or option
:returns: self
:raises: Will raise if invalid parameter provided
"""
if not parameter:
return self
valid_parameters = dict(getmembers(self, predicate=ismethod)).keys()
if parameter not in valid_parameters:
raise ValueError(f"Unknown parameter {parameter}")
self.options[parameter] = {"or": value}
return self
From testing I can see that this won't throw an error if an option for an invalid parameter is set. I just am unsure if I should be blocking the user like the gmail attachment example. eg: the user passes in a parameter spelled slightly wrong. If I should be raising errors I think I may have to refactor test_queries.py
with a unittest class, so we can have access to the assertRaises
function
Your code looks good. I have only 1 very minor suggestion for clarifying the error message, which is to put a colon (:
) preceding the parameter name: "Unknown parameter: {parameter}"
, to aid in readability.
Regarding unit tests, add the following test functions to test_queries.py
:
def test_option_or_valid_parameter():
query = CollectionQuery()
query.option_or("revision_date")
def test_option_or_invalid_parameter():
query = CollectionQuery()
with pytest.raises(ValueError):
query.option_or("no_such_parameter")
This will also require adjusting the imports at the top of that file, to the following:
import pytest
from cmr import CollectionQuery, Query
Oh, just a bit of tidying to the docstring too:
"""
Set this query's `or` parameter option for the specified parameter.
See `CMR Search API Search Options`_.
.. _CMR Search API Search Options:
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#parameter-options
:param parameter: search parameter to specify `or` option for
:param value: specify `True` (default) to retrieve results matching _any_ of the
values set for the parameter; `False` to retrieve _only_ results matching
_all_ values set for the parameter
:returns: self
:raises: `ValueError` if invalid parameter provided
"""
Thanks for the feedback! I was considering putting 'or' and 'and' into quotes in the docstrings but I wasn't sure if it matched the styling of the others well enough.
Oops! I forgot the assertions in the new test functions, plus testing turning the option off. Should be these test functions:
def test_option_or_valid_parameter_on():
query = CollectionQuery()
query.option_or("revision_date")
assert query.options["revision_date"]["or"] is True
def test_option_or_valid_parameter_off():
query = CollectionQuery()
query.option_or("revision_date", False)
assert query.options["revision_date"]["or"] is False
def test_option_or_invalid_parameter():
query = CollectionQuery()
with pytest.raises(ValueError):
query.option_or("no_such_parameter")
assert "no_such_parameter" not in query.options
Thanks for the feedback! I was considering putting 'or' and 'and' into quotes in the docstrings but I wasn't sure if it matched the styling of the others well enough.
Yep, some sort of quoting is necessary so they are not mistaken for regular words in the prose. In this case, I think the backticks are a bit more appropriate than single- or double-quotes since the or
and the and
are literally made part of the query URLs.
@mamamia96, as I mentioned in #75, upon further reading of the CMR Search API documentation, I see that there are other possibilities we need to support, so a single method will allow us to provide better support and coverage. For example, if you search for option[
(including the square bracket) on the documentation page, you will find a number of other options (in addition to or
, and
, ignore_case
, and pattern
) and other possible value types (not only boolean).
Below, I've included what I believe to be a more comprehensive implementation for this issue.
In addition, I've added inline doctest examples, so that we can not only show users how to use the method (and how it behaves), but also get "free" tests, so we don't need to write separate tests in test_queries.py
. I also discovered a bug in your initial implementation in #75, which prevents setting multiple options for the same parameter, so my suggested code fixes that as well.
Finally, I removed the logic that checks for a "valid" parameter for 2 reasons:
include_highlights
parameter, but we don't have an include_highlights
method. In these cases, it is possible for the user to directly set such a parameter (e.g., query.params["include_highlights"] = "true"
), so raising an exception because we do not have a specific include_highlights
method needlessly prevents the user from setting a parameter option for it. That is, to avoid getting an "invalid option" error, a user has to resort to directly setting the query parameter, which defeats the purpose of providing a "convenience" function for the user.include_highlights
parameter, the parameter option name is highlights
. Thus, we do not want to raise an exception when setting an option parameter for "highlights"
because we not only have no method named highlights
(see point above), but we might never have such a method because the corresponding parameter method would be called include_highlights
, not highlights
. In other words, we might eventually want to be able to do this:
results = (
query
.include_highlights()
.option("highlights", "begin_tag", "<b>")
.option("highlights", "end_tag", "</b>")
.get()
)
Here are the specific code changes I suggest.
First, in pyproject.toml
, add the following block at the end of the file, which will run all doctest comments (see code below) when pytest runs:
[tool.pytest.ini_options]
addopts = ["--doctest-modules"]
In cmr/queries.py
, immediately below the from abc import ...
line, add the following line:
from collections import defaultdict
Then, in the same file, add this option
(singular) method:
def option(
self, parameter: str, key: str, value: Union[str, bool, int, float, None]
) -> Self:
"""
Set or remove a search parameter option.
If either an empty parameter name or option key is given, do nothing.
Otherwise, if a non-`None` option value is given, set the specified parameter
option to the value; else _remove_ the parameter option, if previously given.
In all cases, return self to support method chaining.
See `CMR Search API Parameter Options`_ as well as other sections of the
documentation that describe other available parameter options.
.. _CMR Search API Parameter Options:
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#parameter-options
Example:
.. code:: python
>>> query = CollectionQuery()
>>> query.option("short_name", "ignore_case", True) # doctest: +ELLIPSIS
<cmr.queries.CollectionQuery ...>
>>> query.options # doctest: +ELLIPSIS
defaultdict(..., {'short_name': {'ignore_case': True}})
>>> query.option("short_name", "ignore_case", False) # doctest: +ELLIPSIS
<cmr.queries.CollectionQuery ...>
>>> query.options # doctest: +ELLIPSIS
defaultdict(..., {'short_name': {'ignore_case': False}})
>>> (query
... .option("short_name", "ignore_case", None) # remove an option
... .option("short_name", "or", True)
... .option("highlights", "begin_tag", "<b>")
... .option("highlights", "end_tag", "</b>")
... )
<cmr.queries.CollectionQuery ...>
>>> query.options # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
defaultdict(..., {'short_name': {'or': True},
'highlights': {'begin_tag': '<b>', 'end_tag': '</b>'}})
:param parameter: search parameter to set an option for
:param key: option key to set a value for
:param value: value to set for the option, or `None` to remove the option
:returns: self
"""
if parameter and key:
if value is None:
del self.options[parameter][key]
else:
self.options[parameter][key] = value
return self
Further, in the same file, in the Query._build_url
method, we have the following lines:
# all CMR options must be booleans
if not isinstance(val, bool):
raise TypeError(
f"parameter '{param_key}' with option '{option_key}' must be a boolean"
)
As I described above, parameter options are not restricted to boolean values, so the block above should be removed. For example, the values of options[highlights][begin_tag]
and options[highlights][end_tag]
are strings, and the values of options[highlights][snippet_length]
and options[highlights][num_snippets]
are positive integers.
At a later point, we might want to add logic that knows about the expected data types for specific parameter options, but for now, let's simply let the user specify any type of value for any parameter option. If the CMR doesn't like something, it will respond with an error.
Excellent thank you for the detailed feedback I appreciate it. One question I have is do you want self.options
to be a defaultdict
now? Currently the doc string tests get key errors the way we are trying to access them.
update: I think I may have answered my own question after reading the doc string tests a little closer
Excellent thank you for the detailed feedback I appreciate it. One question I have is do you want
self.options
to be adefaultdict
now? Currently the doc string tests get key errors the way we are trying to access them.update: I think I may have answered my own question after reading the doc string tests a little closer
Ah, yes, sorry, I forgot to include that change. It should now be:
self.options: MutableMapping[str, MutableMapping[str, Any]] = defaultdict(dict)
Fixed by #75
Parameter options can currently be set directly on a query instance, like so:
However, this is undocumented, unintuitive, and error-prone.
Given that there are only specific parameter options available, I suggest providing a method (on the
Query
class) per option. For example, for theignore_case
parameter option:This would allow calls like
query.option_ignore_case("entry_title")
to ignore case for search by entry title, for example.Implement methods
option_pattern
,option_and
, andoption_or
similarly.