wildlife-dynamics / ecoscope-workflows

An extensible task specification and compiler for local and distributed workflows.
https://ecoscope-workflows.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
7 stars 2 forks source link

`annotations.DataFrame` json schema coerces `None` #226

Open atmorling opened 1 month ago

atmorling commented 1 month ago

@cisaacstern earlier this week I mentioned coming across a strange issue with optional AnyDataFrame/AnyGeoDataFrame params in a task definition 'breaking' the DataFrame annotation.

Turns out if the param is defined like so:

optional_dataframe: Annotated[
       DataFrame  | None,
        Field(
            description="An optional dataframei.",
            exclude=True
        ),
    ] = None,

.. the default None is coerced into DataFrame, (even if the param is defined in the comp spec explicitly) and causes "default":None to be added to the JSON schema defined below which borks any subsequent calls to parameters_jsonschema() https://github.com/wildlife-dynamics/ecoscope-workflows/blob/6facf602292bfe2fcbf80d4a57974cb619855a71/ecoscope_workflows/annotations.py#L42

I couldn't find a way to elegantly avoid the coercion, but as an easy workaround if this where to come up, the ol' switcheroo makes the problem go away ie;

optional_dataframe: Annotated[
       None | DataFrame,
...
cisaacstern commented 1 month ago

@atmorling thanks for the thoughtful report, and for finding a workaround.

Is it possible that this is happening because, in the examples you give above, DataFrame is not subscripted with a schema? i.e.

https://github.com/wildlife-dynamics/ecoscope-workflows/blob/6facf602292bfe2fcbf80d4a57974cb619855a71/ecoscope_workflows/tasks/preprocessing/_preprocessing.py#L20

That being said, we the WithJsonSchema has perhaps always been a little hacky so maybe we can iterate on that a bit.

I wonder also if you hit this because you're the first to put a DataFrame inside of an Annotated type. For clarity, note that DataFrame itself is an Annotated type, and while nested annotations are supported, this may add to the picture of what's affecting the behavior here (or not 🤷 ).