Open tdyas opened 2 years ago
cc @Eric-Arellano
Tom, do you have an example option and some input that exercise this issue?
An example would be any DictOption
and just put a string as the value where a list is expected or vice versa.
I was thinking the typeguard
package might be useful here, specifically its check_type
function.
I was thinking the typeguard package might be useful here, specifically its check_type function.
Neat library. Great idea to use especially at the edges between trusted/untrusted code/data.
The other approach is to build off of the excellent new API from @thejcannon in https://github.com/pantsbuild/pants/pull/14331 to have DictStringToStringOption
, DictStringToStringListOption
, etc, similar to the Target API. I like the parity with the Target API + avoiding a new runtime depenency for Pants.
I've actually spent some time thinking about this in a bit of detail.
I'm thinking there's two halves to this, and it'd be nice if they could blend together:
Number 2 constrains us quite heavily, since we're limited by what the type checker understands. Luckily there's https://www.python.org/dev/peps/pep-0589/ which introduces TypedDict
(also in typing_extensions
). So really we need to find our roll-our-own 1. which allows us to re-use the TypedDict declaration (i really hope some schema library already exists that supports an input of TypedDict
)
Do we need TypedDict
? From my past usage, that is when we expect certain key names to be defined. It seems like we mostly only care about the types of the keys & values, dict[str, str]
vs dict[str, list[str]]
. We don't care that the keys include "foo"
vs "bar"
.
I don't think we'd need anything more than DictStringToStringOption
, a la the Target API, with its own Pants-inlined validation. NB that we already do this for list options! We have the member_type
mechanism to specify list[int]
vs lint[str]
.
I think I agree with Eric here.. in case the structure is more complex, I think it would also be worth while to have a dataclass or similar that wraps the data, rather than keeping it verbatim in a dict. And the parsing done going from the input data to the dataclass would be responsible for checking/error reporting.
In that case, couldn't we just add value_type
to the DictOption
? Then DictOption(..., value_type=str, ...)
would work as well as DictOption(..., value_type=MyDataClass, ...)
Then DictOption(..., value_type=str, ...) would work as well as DictOption(..., value_type=MyDataClass, ...)
Yeah, that could probably work! Although note that the options system doesn't allow complex dataclasses. It's limited to primivites. I personally like that DictStringToStringOption
constrains the world for you already, no need for a runtime error that value_type=Foo
isn't legal. Yes, it's verbose, but it makes the expectations unambiguous.
StrDictOption
(which could be a specialization of `DictOption).Although note that the options system doesn't allow complex dataclasses. It's limited to primivites.
The options system can be updated, and/or the validation can be in the new options system.
I'm not certain if that was a reply to my point about primitives. Agreed the options system can change, but note that this is fwict a serialization problem more than anything. Options must work in pants.toml
, CLI, and env vars, so we need a way to serialize the data. Unlike BUILD files, we are not dealing with Python. In BUILD files, we have objects
to let you do things like parametrize()
or setup_py()
. That is not possible with options.
IIRC Dict options always have string keys
That's how they're de facto used today. It's theoretically possible to have a dict option going from {1: 'foo'}
, but certainly a bit weird.
We should validate the type of values passed to dictionary options. (Context: I had the wrong type for a dictionary value due to a typo in a
pants.toml
and the bogus type made its way into some Pants rules that could not deal with it. Just gotTypeError: unhashable type: 'list'
which was no help at all in debugging.)At the very least, we should audit existing dictionary options and add type validation for them individually.
Bonus: Provide a way to do type validation via type annotation.