Improved XForms dependency engine

ebruchez commented 11 years ago

Rationale

Up until now, our XForms dependency engine works in a pull fashion. What we do is that we go through all the binds, MIPs, control bindings and values, and we ask the dependency engine: should we evaluate the associated XPath expression?

A much, much better way would be to be able to say: here is a new input in the system, such as a value entered by the user. Now, tell me everything I need to update given this new input. This is very much like spreadsheets, of course. And it is also the original intent of the XForms dependency algorithm ("Recalculation Sequence Algorithm").

The gist

The idea suggested here would be to go further, and to blur the boundaries between:

recalculate
revalidate
refresh

We would like a single dependency graph handling the updates across those boundaries.

Now, the question is how to achieve this. The XForms specification is in a sense very precise about model dependencies, but the other hand is very imprecise about how exactly to calculate them. For example, conditionals are a problem. Also, it doesn't take into account inter-model dependencies, or dependencies of the view on models.

One hope that we have would be to be able to use static analysis to infer enough information to be useful. What this would do is establish a kind of partial order, as well as limiting the number of expressions to evaluate.

(It is very clear that using some kind of dynamic analysis instead (or in addition) would provide more information than just plain static analysis. But it is also harder to specify and to implement. There are issues such as in which order to analyze expressions initially, and how to deal with conditionals. Also we have yet to be convinced that it brings enough benefits. But if it turns out to be, it can be done in a second phase.)

We already have a mechanism to create projections of XPath expressions. So the idea at this point is to evaluate whether we could go far enough with it.

Next steps

A clear first step is to do the following: take the collection of XPath expressions that we already analyze in the model and view, and attempt to produce an all-encompassing graph. Then we must look at what kind of graphs are produced with forms that we or our customers have, and obtain a confirmation that this is the right way to go.

If so, the next step would be to use the graph at runtime to perform updates.

Scenarios handled

TODO: computation in grid

ebruchez commented 10 years ago

A few more thoughts:

we can separate work on the model vs. the view, as dependencies flow from model to view
inter-model dependencies can be handled for top-level models, and then models within XBL components can in turn depend on outer models (e.g. via xxf:instance)
the dependency graph also includes dependencies, via MIP functions, on MIPs
we need to decide what to do if we encounter a cyclical dependency
there are obviously functions which depend on external things, e.g. the request, or internal status, or the random function, etc., and this has to be handled
a first experiment could look into an ideal case
- all expressions can be analyzed
- there are no cycles

ebruchez commented 10 years ago

2014-02-19: Brainstormed circular dependencies in the context of static analysis. There is a thinking that, hopefully, we can use an algorithm that performs updates until there is convergence, in most useful scenarios.

Examples:

bind ref="foo" calculate="preceding::foo + 1"
bind ref="foo" calculate="following::foo - 1"
actual circular dependency between model and view, where
- itemset depends on count node, e.g. 1 to count
- group node depends on xxf:count-items() of the itemset
- group node's relevant MIP depends on count
- xf:group happens to be around xf:itemset in the view

There is some thinking that, ideally, the system should still try to encompass models and view.

ebruchez commented 9 years ago

See also forum thread.

ebruchez commented 9 years ago

See also private discussion. Here the user wants to have a button whose visibility (or readonly-ness) depends on whether the form is valid.

ebruchez commented 8 years ago

One thing to consider too: functions which depend on the environment. Right now, e.g., a calculate with say current-dateTime() or calling a Java function is not updated "continually". Should something be done in such cases, like running such functions?

ebruchez commented 8 years ago

+1 from user to allow cycles between recalculation and revalidation.

ebruchez commented 7 years ago

2017-02-09: Did some quick thinking again with @avernet about how we could maybe do this not statically but dynamically, and handle dynamic dependencies. Not sure whether we reached a significant conclusion.

ebruchez commented 5 years ago

In Form Runner, for the form proper, we have more specific constraints:

controls use bind and not ref to bind to form data
formulas tend to use variable references to refer to other controls

On the other hand:

the implementation of XBL components typically uses plain XForms
parts of Form Runner are written in XForms
some more complex components, like the wizard and error summary,

This said, under the stricter assumptions of bind and variables, determining dependencies should be much easier and not require path maps.

ebruchez commented 5 years ago

Separately, we had recently questions about having readonliness depend on validity (here and here) and readonliness depending on readonliness (here).

We could expand the evaluation order we have currently in the model with calculations to include MIPs and the use of MIP functions, especially as recalculate/revalidate is now (#1773) seen as a single operation.

So with this, a "node" can have:

a value
named properties with values
references to values with variables
references to properties with MIP functions (taking a variable as parameter)

In addition, a node value, which can be calculated, must precede it's determination of validity.

This yields a larger graph, which allows us to determine a more complete execution order for everything in the model.

The UI should tell the user if there is a circular dependency.

ebruchez commented 5 years ago

There is still the question of whether this should/could handle repetitions as described in this comment.

ebruchez commented 5 years ago

Specific question right now: would creating a dependency graph using variables which encompasses not only calculations but MIPs be a realistic and useful thing to do as a first step?

ebruchez commented 5 years ago

We also would like to handle section templates better, including, eventually, having a single dependency graph for the form including section templates.

But, after discussion with @avernet, implementing the above (dependency graph which would include MIPs) would still be valuable as a first step.

avernet commented 5 years ago

+1 from customer to allow the readonlyness of a control to depend on the validity of another control

ebruchez commented 4 years ago

+1 from customer to depend on relevance.

ebruchez commented 4 years ago

+1 from customer

ebruchez commented 3 years ago

Working on general form processing performance, we might want to look at a first step where we optimize the evaluation of control bindings and values, ignoring, for now, optimizations in the model. The idea is this:

build a dependency graph based on controls' XPathAnalysis
- on the left, we have root nodes, which represent projections of data nodes (instance('fr-form-instance')/foo/bar)
- on the right, we have leaf nodes, which represent a (static) control's binding or value (and more later like LHHA)
- arrows from root nodes to leaf nodes indicate whether a change to the value of a node with that projection requires the re-evaluation of the control's binding or value
provide alternate implementation of Controls.updateBindings
- right now, requireBindingUpdate uses
  - intersectsStructuralChangeModel, using structuralChangeModelKeys, modelStates, instancesByKey
  - intersectsBinding, using refreshChangeset
- so we probably will use the same information in PathMapXPathDependencies
- now what we want is
  - handle structural changes first (they are not optimized for now)
  - then for non-structural changes
    - see which roots intersect refreshChangeset
    - recompute the impacted bindings by following the arrows
    - recompute all bindings that don't have known dependencies

Open questions:

we need to update all concrete controls for a given static control that requires update
what order do we need to follow for updates?
- with the current implementation we go in doc order
- but here we might not have a known doc order; do we need to sort values first? is there a shortcut?

ebruchez commented 3 years ago

BindingUpdater does some tricky things, including:

not recursing into new repeat iterations
forcing binding evaluation
- within descendent-or-self of controls that have changed relevance
- if the control has a @model attribute (with a comment from 2012 that says "TODO TEMP HACK")
if the binding is not updated, refreshBindingAndValues is called to "make sure the parent is updated, as ancestor bindings might have changed, and it is important to ensure that the chain of bindings is consistent"

ebruchez commented 3 years ago

Quick experiment with large form:

395 bindings that couldn't be determined!
out of those, 70 xxf:binding() and 166 $binding or similar (remain 159)
- fix xxf:binding() support
- then check remaining 159
TBD

ebruchez commented 3 years ago

When we set or refresh bindings, we don't only "set the binding". We also:

compute relevance
compute evaluate() or evaluateNonRelevant()
- evaluate() calls preEvaluateImpl(), which calls markExternalValueDirty() if needed

The relevance can come from:

the binding being the empty sequence or not
the binding pointing to an instance data node with relevant MIP set to true or false

Also:

non-relevance in the model is inherited by descendant nodes
non-relevance in the view is inherited by descendant nodes

This means that our graph should be modified:

a binding must be updated not only if its own binding is impacted, but if any ancestor binding is impacted
a binding must be updated (or at least the associated relevance part of it) if there are updates to the upstream dependencies on the relevant MIP in the model
- and these also include ancestor nodes in the instance

ebruchez commented 3 years ago

A single value change in the data can cause:

a relevant MIP to change in the model
a control binding to change (and possibly change its relevance)
a control relevance to change, while keeping the exact same binding

In addition, of course:

other MIPs (readonly, required, valid, custom) changes
calculation changes
control value changes
itemset changes
LHHA changes
external value changes

ebruchez commented 3 years ago

Our approach now is to create a unified graph for all models and controls (in a given part analysis). Once we have the graph, we can decide how to use it, including whether to use it in the models or the view only; whether to support inter-model dependencies; etc.

The graph itself, once created, is immutable.

(Note that in Form Builder, the part analysis for the form being edited at this time doesn't need the graph as it doesn't perform recalculations, evaluation of MIPs, etc. At least, not in the "large". We can see later whether some optimizations are possible there as well.)

The graph has the following requirements:

unified for a given part
inputs are "paths"
outputs (after appropriate graph traversal) are "actions" such as
- update this MIP
- recalculate this expression
- update this control's binding
- recalculate this control's value
- etc.

We keep the model's rebuild/recalculate-revalidate cycle. There can be multiple such cycles before a UI refresh.

LIke now, we keep track of changes (as paths) that pertain to UI updates between refreshes.

ebruchez commented 3 years ago

Within models, the algorithm will be as follows:

for now, structural changes still clear all dependencies for a given model/instance
- optimize that later
when a bind RR is needed
- take the modified paths
- determine which XPath MIPs need reevaluation
- evaluate them in topological order

Within the view:

value changes are gathered as is the case now
when a refresh is needed
- take the modified paths
- determine which bindings require reevaluation
- determine which values, LHHA, and itemsets require reevaluation
- perform them in topological order (or just document order)

ebruchez commented 3 years ago

Still struggling with representing the graph, but the idea now is to have different types of nodes:

Node.Data
- represents a single (projected) unique path
- links to one or more Node.Bind (and more like Node.Control?) (two binds can point to the same path)
Node.Bind
- represents a single <xf:bind>
- links to one or more Node.Datas (you can have ref="foo | bar")
- links to zero or more Node.Props
Node.Prop
- represents a property, including value, MIP, calculations
- links to a single Node.Bind
Node.Control
- represents a control in the view
- links to a Node.Bind (if bind) or one or more Node.Datas (binding)
- links to zero or more Node.Props with value, LHHA, itemset
- MIPs are linked indirectly via the binding
- TODO: other properties like class and custom control properties

Reminder:

this is done during static analysis (no actual data or XPath evaluation)
works with path projections
some dependencies might be undetermined

ebruchez commented 3 years ago

Extension attributes on controls (which can be AVTs):

style, class, role
data-*
xf:secret
- xxf:size
- xxf:maxlength
- xxf:autocomplete
xf:output
- image: xxf:alt
- download: xxf:target
selection
- xxf:group
- xxf:title
xf:trigger
- xxf:title
xf:input
- xxf:size
- xxf:maxlength
- xxf:autocomplete
- xxf:title
- xxf:pattern
xf:upload
- accept
- mediatype
- xxf:title
xf:textarea
- xxf:maxlength
- xxf:cols (obsolete)
- xxf:rows (obsolete)
xf:group for td or th
- rowspan
- colspan

We only need to consider as properties in the dependency system those that are AVTs.

ebruchez commented 3 years ago

We need to represent the inheritance of required and readonly.

The answer we need (for now) is "given a change to this value, which controls must update their relevant or readonly property`.

(In a first step the model will just work as usual, without leveraging the dependency graph. So it will store relevant values the good old way.)

It might be that an intermediary bind doesn't have known dependencies:

<xf:bind ref="instance()" relevant="@foo = 42" id="fr-form-bind">
    <xf:bind ref=".//bar/baz" id="foo-baz-bind"/>
</xf:bind>

<xf:input bind="foo-baz-bind" id="control-via-bind"/>

<xf:input ref="instance()//bar/baz" id="control-via-ref-1"/>

<xf:input ref="instance()/bar/baz" id="control-via-ref-2"/>

foo-baz-bind and control-via-ref-1 will have unknown bind and MIP dependencies so will work.

control-via-ref-2 will have known binding dependencies. How do we make sure that we can re-evaluate its MIPs? It seems that it might not be possible without handling the notion of a descendant axis? Do we care or is this too far-fetched to handle for now?

With the current system, we refresh all controls during a refresh. With the proposed new system, we'd want to try to avoid that.

We could detect inheritance from two sources:

at the binds level
at the controls level

In this case though, control-via-ref-2 has no enclosing control, so we just can't know that it needs to refresh its MIPs.

The answer might just be to require avoiding this kind of scenarios for now. Could we detect them?

ebruchez commented 3 years ago

After discussion, the case of control-via-ref-2 can be handled by looking at paths and subpaths. That's the correct way to handle this for controls bound via ref. For controls bound via bind, we can do it the same way (inferring the path from the Node.Bind) or taking a shortcut by looking at the hierarchy of binds (if we make the assumption that bind nesting mirrors data nesting, which is good practice but not mandatory).

We also need to look at the nesting within the view.

ebruchez commented 3 years ago

xf:switch/xf:case and xf:toggle need some special handling. There are 4 cases:

xf:switch/xf:case with xxf:xforms11-switch="false" (default)
- nothing to do as all xf:cases follow the switch's relevance
xf:switch/xf:case with xxf:xforms11-switch="true"
- relevance can change if xf:toggle is used
xf:switch/xf:case with xxf:xforms11-switch="false" and @caseref
- nothing to do as all xf:cases follow the switch's relevance
xf:switch/xf:case with xxf:xforms11-switch="true" and @caseref
- relevance can change depending on the value stored at @caseref

fr:section:

no @caseref
xxf:xforms11-switch="true" in Form Builder (ok) or with wizard optimization

orbeon / orbeon-forms