Closed jl-wynen closed 8 months ago
Have you thought about our front-page claim https://scipp.github.io/sciline/#but-i-do-not-want-to-change-all-my-code. Can the same be done without having Sciline-specific attributes, etc.?
Users need to define new types anyway (using NewType
or Scope
). What I propose does not require changes to the users types.
But it is intrusive in that it adds attributes to the class. If we want to avoid that, we need a different way of identifying parameters and attaching validators to them. I can think of three possibilities:
parameter
decorator would register types there.
Pipeline
. Add a method like Pipeline.add_parameter(ty, value, validate, convert)
.
Pipeline.register_parameter(ty, validate, convert)
and Pipeline.set_parameter(ty, value)
.
For completeness:
Users need to define new types anyway (using
NewType
orScope
).
They can use any type they want. If they already have strong types instead of generic containers in their code no change is needed. And NewType
is in the standard library, i.e., no dependency on Sciline is added.
What I propose does not require changes to the users types.
Where would the decorator run / be used?
I was operating under the assumption that users anyway have to define new types because it is extremely rare to use a dedicated type for every single entity.
And it would then be ok to depend on Sciline and use the decorator in that code as it will anyway require a lot of uses of sciline.Scope
.
And it would then be ok to depend on Sciline and use the decorator in that code as it will anyway require a lot of uses of sciline.Scope.
Not sure I agree. I have always seen sciline.Scope
as hack hack for generic aliases. It caters to some lazyness about defining a proper generic type. And it will go away with Python 3.12's generic aliases, I think?
In that case, can you comment on the alternatives?
Options 1-3 seem very invasive. Looking at the original requirements you posted on top:
Validate parameter values
Document parameters, e.g., for building a GUI (Docstrings are not supported for type aliases. And we may need more structured data, e.g., to encode valid ranges.)
Support different types. While the linked issue is closed, the current solution requires using Union types and providers have to handle the different types themselves.
Default values.
How about using Pydantic, or something like that? It seems to fulfill 4/4 of the above. Downside is the need to access the param property in providers, but is that actually a problem, compared to the advantages of using an existing and well-known solution?
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (c) 2023 Scipp contributors (https://github.com/scipp)
from pydantic import BaseModel, Field
import sciline as sl
from pathlib import Path
from typing import Union
def test_simple_param() -> None:
class Parameter(BaseModel):
value: int = Field(..., description="An int", ge=0, le=100)
def use_param(param: Parameter) -> float:
return 1.5 * param.value
pipeline = sl.Pipeline((use_param,))
pipeline[Parameter] = Parameter(value=10)
assert pipeline.compute(float) == 15.0
def test_param_with_two_different_value_types() -> None:
class Param(BaseModel):
value: Path = Field(..., description="A string or a path")
def use_path(param: Param) -> bool:
return isinstance(param.value, Path)
pipeline = sl.Pipeline((use_path,))
pipeline[Param] = Param(value="test")
assert pipeline.compute(bool) == True
pipeline[Param] = Param(value=Path("test"))
assert pipeline.compute(bool) == True
I considered this but didn't go that way because it requires providers to be aware of it. Also, it also requires defining new classes. And it gives special meaning to pydantic models. We can no longer use them for data that shouldn't be treated as params.
Also, it also requires defining new classes.
Don't you need to do that anyway if you want validators and docs?
And it gives special meaning to pydantic models. We can no longer use them for data that shouldn't be treated as params.
You mean because you need to access a property?
Also, it also requires defining new classes.
Don't you need to do that anyway if you want validators and docs?
You complained about this earlier. This is why I proposed 1-3
And it gives special meaning to pydantic models. We can no longer use them for data that shouldn't be treated as params.
You mean because you need to access a property?
No. I mean when listing all parameters. It would be nice to put all providers into a pipeline and then loop through all keys (present and missing) to list all parameters to, e.g., build a GUI or other interface. But this is based solely on the types. So if we rely on pydantic models, all models used anywhere in the graph would be interpreted as parameters. (You may say that we can exclude keys that are produced by providers. But that doesn't work because we may have providers that are alternatives for determining params. E.g., There may be a Filename that can be converted into a FilePath. But the user can just as well specify a FilePath directly.)
Also, it also requires defining new classes.
Don't you need to do that anyway if you want validators and docs?
You complained about this earlier. This is why I proposed 1-3
I did? Where?
And it gives special meaning to pydantic models. We can no longer use them for data that shouldn't be treated as params.
You mean because you need to access a property?
No. I mean when listing all parameters. It would be nice to put all providers into a pipeline and then loop through all keys (present and missing) to list all parameters to, e.g., build a GUI or other interface. But this is based solely on the types. So if we rely on pydantic models, all models used anywhere in the graph would be interpreted as parameters.
This seems like conflated requirements, not what the original premise of this issue was.
I did? Where?
Here:
Have you thought about our front-page claim https://scipp.github.io/sciline/#but-i-do-not-want-to-change-all-my-code.
The page says
Sciline is very non-invasive. If you are willing to define a few types (or type aliases) and add type annotations to your functions
But you now suggest wrapping data in dedicated models. Yes, those are independent from Sciline. But they are still tacked on top of the normal user code.
This seems like conflated requirements, not what the original premise of this issue was.
I'd rather not refactor this if we want UIs.
I am not saying that "we" use Pydantic models, I am basically just suggesting that no change to Sciline may be required. The user may use Pydantic models, or anything else.
Sure. Users can do that. I wanted to see how Sciline can help.
Finding/listing all parameters can simply be done by (a) defining it in a module/package, which is probably what one might need anyway, since automatic seldom does the job, or (b) find all "unsatisfied" requirements in a pipeline (we do this already for task-graph visualization).
(a) Then this can be done directly in the UI code. So the whole discussion is moot. We can just do all documentation, validation, conversion, etc in the UI code. (Or a dedicated module) But this is hard to extend when we add to a pipeline.
(b) This fails with the Filename / FilePath example i gave.
I did? Where?
Here:
Have you thought about our front-page claim https://scipp.github.io/sciline/#but-i-do-not-want-to-change-all-my-code.
The page says
Sciline is very non-invasive. If you are willing to define a few types (or type aliases) and add type annotations to your functions
But you now suggest wrapping data in dedicated models. Yes, those are independent from Sciline. But they are still tacked on top of the normal user code.
Dedicated by the user, yes. Validators and docstrings must be defined with a parameter, how did you intend to do this outside the user code? I don't see how that can be done without writing a class. My point is: I do not want to require writing that code (parameters, validators, ...) in a Sciline-specific manner.
This seems like conflated requirements, not what the original premise of this issue was.
I'd rather not refactor this if we want UIs.
Refactor what?
I am not saying that "we" use Pydantic models, I am basically just suggesting that no change to Sciline may be required. The user may use Pydantic models, or anything else.
Sure. Users can do that. I wanted to see how Sciline can help.
What are you referring to here, avoiding to write param.value
instead of param
?
Finding/listing all parameters can simply be done by (a) defining it in a module/package, which is probably what one might need anyway, since automatic seldom does the job, or (b) find all "unsatisfied" requirements in a pipeline (we do this already for task-graph visualization).
(a) Then this can be done directly in the UI code. So the whole discussion is moot. We can just do all documentation, validation, conversion, etc in the UI code. (Or a dedicated module) But this is hard to extend when we add to a pipeline.
Not sure what you are stating here.
(b) This fails with the Filename / FilePath example i gave.
Every solution fails somewhere.
I'd rather not refactor this if we want UIs.
Refactor what?
The definitions of all parameters in all ess packages because we find out that they don't work for building a UI. If we go back to the stance that we will never provide GUIs and never implement helpers for it, then this doesn't matter.
What are you referring to here, avoiding to write param.value instead of param?
Yes and having to wrap values in a class and having to come up with an attr name (value
). I wanted to have a solution that is transparent. I.e., apart from the parameter
decorator, it looks and feels like using pipelines without special param handling.
(a) Then this can be done directly in the UI code. So the whole discussion is moot. We can just do all documentation, validation, conversion, etc in the UI code. (Or a dedicated module) But this is hard to extend when we add to a pipeline.
Not sure what you are stating here.
I'm saying that, if pushing all responsibility for validation onto the user and treating GUIs as a separate matter, why do we talk about this in the first place?
(b) This fails with the Filename / FilePath example i gave.
Every solution fails somewhere.
Sure. But we need to pick the features we want to support. So does this mean that you don't want to support the example I gave?
I'd rather not refactor this if we want UIs.
Refactor what?
The definitions of all parameters in all ess packages because we find out that they don't work for building a UI. If we go back to the stance that we will never provide GUIs and never implement helpers for it, then this doesn't matter.
Maybe you can explain in person, but I am not sure I understand why using an @parameter
decorator avoids any need for refactoring, while some other solution cannot.
What are you referring to here, avoiding to write param.value instead of param?
Yes and having to wrap values in a class and having to come up with an attr name (
value
). I wanted to have a solution that is transparent. I.e., apart from theparameter
decorator, it looks and feels like using pipelines without special param handling.
Maybe I just don't understand your proposal. How will you implement the bullet points from your original message, specifically (1) doc/description, (2) validators, and (3) default values. How can that be done without a class or something similar?
(a) Then this can be done directly in the UI code. So the whole discussion is moot. We can just do all documentation, validation, conversion, etc in the UI code. (Or a dedicated module) But this is hard to extend when we add to a pipeline.
Not sure what you are stating here.
I'm saying that, if pushing all responsibility for validation onto the user and treating GUIs as a separate matter, why do we talk about this in the first place?
Sciline does not know how parameters should be validated. Only the Sciline user can do that. Again, I do not understand what you have in mind otherwise.
(b) This fails with the Filename / FilePath example i gave.
Every solution fails somewhere.
Sure. But we need to pick the features we want to support. So does this mean that you don't want to support the example I gave?
Let us focus on the main discussion for now, as there are apparently a few misunderstandings, so we should avoid getting side-tracked.
A comment about using custom classes with pydantic: It is non-trivial to use types as fields in pydantic models that are not supported natively. And it needs a modification of the class. E.g., to make it work with scipp, we need something like this:
from typing import Any, Never, TypeVar
import scipp as sc
from pydantic import BaseModel, GetCoreSchemaHandler
from pydantic_core import core_schema
import sciline as sl
T = TypeVar('T')
def identity(x: T) -> T:
return x
def serialize_var(x: sc.Variable) -> Never:
raise NotImplementedError()
def schema(
cls, _source_type: Any, _handler: GetCoreSchemaHandler
) -> core_schema.CoreSchema:
return core_schema.no_info_after_validator_function(
identity,
core_schema.is_instance_schema(sc.Variable),
serialization=core_schema.plain_serializer_function_ser_schema(
serialize_var, info_arg=False, return_schema=core_schema.str_schema()
),
)
# Need to modify the class!
sc.Variable.__get_pydantic_core_schema__ = classmethod(schema)
class Range(BaseModel):
value: sc.Variable
def foo(r: Range) -> int:
return len(r.value)
params = {Range: Range(value=sc.arange('x', 4))}
pl = sl.Pipeline((foo,), params=params)
print(pl.compute(int)) # -> 4
Ideally, the user should not modify a scipp (or numpy, pandas, etc) class. So this needs by-in by the corresponding library. (I'm not opposed to supporting pydantic schemas for our types. This can also help with serializing parameters.)
Conclusion of offline discussion:
Points 1, 2, 4 can be done outside of Sciline with an existing package or a bespoke solution. But it does not fit the core purpose of Sciline which is building task graphs.
Concerning param inspection and building GUIs, we can have common classes, functions, protocols in another package, e.g., ESSreduce. This can be used by our own special-purpose GUI-building library.
👍
Concerning param inspection and building GUIs, we can have common classes, functions, protocols in another package
Note also #83, which would enable this.
With that conclusion, I am going to close this, please correct me if I am wrong.
We came across the need to have a richer parameter handling multiple times:
Union
types and providers have to handle the different types themselves.See also the guidelines https://github.com/scipp/essreduce/blob/main/docs/user-guide/reduction-workflow-guidelines.md
Proposed solution
I propose adding a decorator that adds the above to a user type. Here is a basic prototype:
The only change this requires is inserting this
here: https://github.com/scipp/sciline/blob/8467e25b9857fc40fda16de00f8eacc25fe193cf/src/sciline/pipeline.py#L430
Alternative:
Supporting
typing.Annotated
(#36) would allow us to add metadata to types which is useful could be sued for the above purposes. But as noted in the issue, this may be brittle. And encoding all relevant forms of metadata can be difficult.