pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.33k stars 636 forks source link

validate type of values passed to dictionary values #14507

Open tdyas opened 2 years ago

tdyas commented 2 years ago

We should validate the type of values passed to dictionary options. (Context: I had the wrong type for a dictionary value due to a typo in a pants.toml and the bogus type made its way into some Pants rules that could not deal with it. Just got TypeError: unhashable type: 'list' which was no help at all in debugging.)

At the very least, we should audit existing dictionary options and add type validation for them individually.

Bonus: Provide a way to do type validation via type annotation.

tdyas commented 2 years ago

cc @Eric-Arellano

kaos commented 2 years ago

Tom, do you have an example option and some input that exercise this issue?

tdyas commented 2 years ago

An example would be any DictOption and just put a string as the value where a list is expected or vice versa.

I was thinking the typeguard package might be useful here, specifically its check_type function.

kaos commented 2 years ago

I was thinking the typeguard package might be useful here, specifically its check_type function.

Neat library. Great idea to use especially at the edges between trusted/untrusted code/data.

Eric-Arellano commented 2 years ago

The other approach is to build off of the excellent new API from @thejcannon in https://github.com/pantsbuild/pants/pull/14331 to have DictStringToStringOption, DictStringToStringListOption, etc, similar to the Target API. I like the parity with the Target API + avoiding a new runtime depenency for Pants.

thejcannon commented 2 years ago

I've actually spent some time thinking about this in a bit of detail.

I'm thinking there's two halves to this, and it'd be nice if they could blend together:

  1. One half is the runtime schema which needs to be validated against. Theres more than a few libraries out there, we can take our pick, but see point 2
  2. The other half is the type annotation of the option property. Ideally this this fully defined so we get nice rich deep type hints.

Number 2 constrains us quite heavily, since we're limited by what the type checker understands. Luckily there's https://www.python.org/dev/peps/pep-0589/ which introduces TypedDict (also in typing_extensions). So really we need to find our roll-our-own 1. which allows us to re-use the TypedDict declaration (i really hope some schema library already exists that supports an input of TypedDict)

Eric-Arellano commented 2 years ago

Do we need TypedDict? From my past usage, that is when we expect certain key names to be defined. It seems like we mostly only care about the types of the keys & values, dict[str, str] vs dict[str, list[str]]. We don't care that the keys include "foo" vs "bar".

I don't think we'd need anything more than DictStringToStringOption, a la the Target API, with its own Pants-inlined validation. NB that we already do this for list options! We have the member_type mechanism to specify list[int] vs lint[str].

kaos commented 2 years ago

I think I agree with Eric here.. in case the structure is more complex, I think it would also be worth while to have a dataclass or similar that wraps the data, rather than keeping it verbatim in a dict. And the parsing done going from the input data to the dataclass would be responsible for checking/error reporting.

thejcannon commented 2 years ago

In that case, couldn't we just add value_type to the DictOption? Then DictOption(..., value_type=str, ...) would work as well as DictOption(..., value_type=MyDataClass, ...)

Eric-Arellano commented 2 years ago

Then DictOption(..., value_type=str, ...) would work as well as DictOption(..., value_type=MyDataClass, ...)

Yeah, that could probably work! Although note that the options system doesn't allow complex dataclasses. It's limited to primivites. I personally like that DictStringToStringOption constrains the world for you already, no need for a runtime error that value_type=Foo isn't legal. Yes, it's verbose, but it makes the expectations unambiguous.

thejcannon commented 2 years ago
Eric-Arellano commented 2 years ago

Although note that the options system doesn't allow complex dataclasses. It's limited to primivites.

The options system can be updated, and/or the validation can be in the new options system.

I'm not certain if that was a reply to my point about primitives. Agreed the options system can change, but note that this is fwict a serialization problem more than anything. Options must work in pants.toml, CLI, and env vars, so we need a way to serialize the data. Unlike BUILD files, we are not dealing with Python. In BUILD files, we have objects to let you do things like parametrize() or setup_py(). That is not possible with options.

IIRC Dict options always have string keys

That's how they're de facto used today. It's theoretically possible to have a dict option going from {1: 'foo'}, but certainly a bit weird.