StateManager needs to be re-architected

jace commented 9 months ago

StateManager is currently used as an instance of a reference StateManager that lives as a property on the model and sources its states from an external enum. This worked in the pre-type hint era, but it was already littering the client model with conditional states and their lambdas, and is not transparent to type analysis. The API could be rewritten to be cleaner:

class ProjectState(StateType[int]):
    # StateManager is like an enum now, but without the magic of Enum.
    # Enums require special-casing by type checkers, with many gotchas.
    # The data type for State values is specified as a generic arg, and it
    # could also point to an existing enum for validation, in the form:
    # `class ProjectState(StateType[ProjectStateEnum])`
    # This should validate 1:1 mapping from enum members to states.

    is_draft = State(1, __("Draft"))
    is_published = State(2, __("Published"))
    is_withdrawn = State(3, __("Withdrawn"))
    is_deleted = State(4, __("Deleted"))
    # or: is_deleted = State(ProjectStateEnum.DELETED)  # Metadata copied from enum

    # Groups of states can be defined right here, unlike in Enum.
    # Using `|` is less ambiguous than the old set-based syntax, and
    # possible because states are already objects that can implement `__or__`
    is_gone = is_withdrawn | is_deleted  # But how is this assigned a title?

    # Conditional states can be defined like properties and can
    # assume a fixed relationship with the client model (positional arg)
    @is_published.conditional
    def is_future(self, project: Project) -> bool:
        return project.start_at > utcnow()

    # SQL expression can also be defined here, although it's likely to give us
    # similar grief to SQLAlchemy, forcing us to make it `.inplace.expression`
    @is_future.expression
    def is_future(self, cls: Type[Project]) -> sa.ColumnExpression:
        return cls.start_at > sa.func.utcnow()

class Project(…, Model):
    # The state manager should sits directly on the column, not alongside, assuming
    # SQLAlchemy gives us that kind of access. We'll need more hijinks to give
    # mapped_column a hint of what the SQL data type is.
    state: Mapped[ProjectState] = sa.orm.mapped_column(
        ProjectState(),
        default=ProjectState.is_draft
        # or: default=ProjectStateEnum.DRAFT
    )

Transitions are particularly painful. We've learnt they're not simple. They have multiple stages spread across models, forms and views, and is one major reason for moving the state spec out of the model. Transitions have:

Pre-validation:
- Is this transition available given the object's current state? (Model concern)
- Is this transition available given the user's credentials? (View concern)
- Is this transition available given related state managers and causal chains (see below; potentially both Model and View concern)
Requirements:
- What must the user supply to be able to make this transition? (View concern, requesting form schema)
Validation:
- Has the user supplied the required information? (Form concern)
Execution
- What should we change alongside the state? (eg: publish date)
Dependencies across state managers
- Multiple state managers in the same model
- State managers spread across models (eg: publish transition is available only if parent is also published)
- Chained transitions (eg: deleting parent will also delete all children, which are different models with different state managers, but the causal chain requires confirmation the user can execute the entire chain)

StateManager's current method of defining a transition via a single method on the model is inadequate. That method currently does pre-validation with execution, limited to the model level, with no support for form schema, validation or causal chains. We will instead need to define a transition as a distinct noun that serves as a registry for the various supporting bits across models, forms and views.

Should this transition registry reside in the state manager, sharing namespace with states? Or in the model as at present?

jace commented 9 months ago

If we assume the Enum syntax has been reviewed enough to be a good reference syntax, how can this be done using enums as the base? Let's try:

class StateManager(Enum):
    def __get__(self, obj, cls):
        if obj is None: return self
        return inspect_wrapper(obj, self)

class ProjectState(StateData, StateManager):
    DRAFT = 1, __("Draft")
    ...

    def is_future(self, target: Project) -> bool:
        return self is ProjectState.PUBLISHED and target.start_at > utcnow()

In this scheme, conditional states are methods and visibly distinct from states. This may be good for clarity, but it means the test can't be optimized for the default False outcome from the base state, as the expected base state is only known imperatively. Methods in an enum can't be linked to particular members.

Enum does not have syntax for groups. The | syntax in Flag and IntFlag represents all instead of any. Others have hacked in the any possibility by inserting a set into the enum after defining it. Such insertions are not recognised by the Enum API or type checkers.

The enum method's target attribute (in this example) can be inserted by the wrapper in the descriptor. However, the wrapper is once again a proxy and therefore not inspectable by a type checker. There is no obvious way to produce instance vs SQL versions of state checks.

jace commented 9 months ago

We could reduce the expectation from Enum to add nothing new, and make the following work with very little change to the current StateManager implementation:

class ProjectStateEnum(StateData, Enum):
    DRAFT = 1, __("Draft")
    ...

class ProjectState(StateManager, enum=ProjectStateEnum):
    # The enum= parameter is used to audit and/or autofill entries

    # Specify 1:1 names all over again
    DRAFT = ManagedState(ProjectStateEnum.DRAFT)
    # Stub only, matched by name to the enum
    PUBLISHED = ManagedState()
    ...

    # Options for group definition
    # These only work without autofill from enum, requiring explicit enumeration above

    # Plain set, requiring post-processing that is opaque to type checkers
    GONE = {WITHDRAWN, DELETED}
    # These can be type hinted correctly, but
    # have no metadata
    GONE = WITHDRAWN | DELETED
    # Metadata hack, possibly ugly
    GONE = GONE | {'description': "Returns 410"}
    # First class object, no magic
    GONE = ManagedStateGroup(WITHDRAWN, DELETED, description="Returns 410")

The existing use state = StateManager(enum, column) can be changed to state = ProjectState(column). The code in __init__ that constructs ManagedState instances to add to self is no longer needed because they're already in the class. Conditional states and groups don't have to clutter the client model. Type checkers can enumerate contents and catch typos.

The downstream API remains unchanged (so far).

jace commented 9 months ago

The enum appears to have no utility here and can be removed, making the StateManager subclass itself the enum.

Autofill precludes subclassing ManagedState, and the metadata in ManagedStateGroup is a distinct data type from the StateData dataclass given to the Enum. For proper type hinting, we'll need a single base type:


DataType = TypeVar('DataType')
StateType = TypeVar('StateType', bound='ManagedState')

@dataclass
class ManagedState(Generic[DataType]):
    arg: InitVar[DataType | Iterable[StateType[DataType]]]
    value: DataType | None = field(init=False, default=None)
    group: frozenset[DataType] | None = field(init=False, default=None)
    validator: Callable[..., bool] = field(init=False, default=None) # For conditional states
    # Subclasses can add desired metadata fields

    @property
    def is_group(self) -> bool:
        return self.states is not None

    def __post_init__(self, arg):
        if isinstance(arg, (set, frozenset, list, tuple)):
            self.value = None
            self.group = frozenset(a.value for a in arg)
        else:
            self.value = arg
            self.group = None

    def conditional(self, validator) -> Self:
        new_state = dataclass.copy(self)
        new_state.validator = validator
        return new_state

    # All functionality of ManagedState and ManagedStateGroup should be in this class

@dataclass
class State(ManagedState[int]):
    title: str
    description: str = ''

class ProjectState(StateManager):
    DRAFT = State(1, "Draft")
    PUBLISHED = State(2, "Published")
    ...
    GONE = State({WITHDRAWN, DELETED}, "Gone", "Returns 410")

    @PUBLISHED.conditional
    def FUTURE(self, target: Project) -> bool:
        return target.start_at > utcnow()

This seems much cleaner, except the conditional validator, which requires knowledge of a model defined later. Maybe we can instead leave the conditional state as a stub and add validators later:

class ProjectState(...):
    FUTURE = PUBLISHED.conditional()

class Project:
    state = ProjectState()
    @state.FUTURE.validator
    def is_future(self) -> bool:
        return self.start_at > utcnow()

This puts the test in the correct place, and the method can be called directly on the model too, but how does project.state.FUTURE (or Project.state.FUTURE via the class) get the correct context to supply the validator? Should it enforce a signature? Especially if other test functions can be defined elsewhere, in forms and views with different contexts?

The lambda-ish approach of placing the validator alongside the state definition is less surprising.

jace commented 9 months ago

Transitions should also be subclassable data models (replacing functions with metadata), with registries for the various connected requirements (validators, signals) as described earlier in this ticket. These registries will again have a context problem, so maybe it's worth thinking about context supply some more.

Eg: StateManager on the model is a descriptor that creates a context wrapper for the model or instance. Should we have versions of this for forms and views too? If a transition or conditional state has validators in different contexts, how does the wrapper track both contexts?

We've already experienced this with simple transitions that require a specific role but not any data: the view handlers reproduce part of the state model knowledge (is this transition allowed for this user, etc), and the front-end JS has a 100% independent understanding, making changes to the state model fairly hard.

jace commented 9 months ago

StateManager subclasses can maintain a registry of all their host classes, courtesy __set_name__:

class StateManager:
    __hosts__: dict[Type, set[str]]
     # cls: attrs > cls.attr for each attr, usually only one
   __members__: dict[str, ManagedState]

    def __init_subclass__(cls):
        cls.__hosts__ = {}
        cls.__members__ = {}
        super().__init_subclass__()
        # Validate states on the subclass and populate __members__

    def __set_name__(self, owner, name):
        self.__hosts__.setdefault(owner, set()).add(name)

jace commented 9 months ago

The __hosts__ dict should be a weakref.WeakKeyDict because hosts can be transient, like classes defined within tests.

jace commented 9 months ago

If state models have registries, then two unrelated models that happen to have the same state definitions cannot reuse the model — they'll become linked. This is resolved by creating independent subclasses with no other changes, but it introduces instance re-use gotchas to the class itself.

hasgeek / coaster

StateManager needs to be re-architected #430