Tree update general plan

jgraley commented 1 year ago

Tying together past and future work and taking into account #718. We have to decide what will be done using Command Sequence Transformations CSTs versus what will be done by executing the CS (CSX). It turns out, CSTs will always be at least as powerful since all inputs provided to CSX is available to CSTs.

Thus we use CSTs for most of the work, ending up with hopefully quite a small CS to actually execute.

A sticking point was tree zone duplication, which I has assumed should be actually done at CSX time. But that would complicate analysis. In fact, acting on a DuplicateTreeZoneCommand during CST is no different than eg evaluating a const expression during a compilation step - absolutely fine. '24 Except that it will not play nicely with planning.

Prefactors:

[x] Allow registers to be TreeZone as well as FreeZone in the usual polymorphic way (DONE)
[x] Split out a DeclareTreeZoneCommand from the DuplicateTreeZoneCommand
[x] JoinZoneCommand to JoinZoneCommand and in here
[x] Switch from DeleteCommand/InsertCommand to ModifyTreeCommand
[x] ModifyZoneCommand to take TreeZone as arg
[x] Move the "work" of commands into Zone class methods
[x] Add support for expressions: expression node for a zone woks with either kind, and recurses for terminii. Join is inferred at eval time. Duplicated is also inferred on type mismatch, but the marking has to be explicit.
[x] Recode empty FZ elision using this
[x] Recode TZ overlap using this
[x] Make MarkBaseForEmbeddedCommand able to work on TreeZones because it may need to
[x] Support for ModifyTreeCommand with terminii in DB.
- Optimise to only redo parents for SC (and maybe subtree size) once per command/zone pair

Command Sequence Transformations (CSTs):

[x] Drop empty zones (tree and free), fixing up reg refs #717
[x] Analysis: identifiy overlapping proposed tree zones (required for rule #217 compliance) by CST duplication. #707
[x] Analysis: identifiy tree zone depth-first mis-ordering by CST duplication #722
- Needs a meta-execute pass that extracts shape of anticipated tree, needs #719
[ ] Maybe: add a database data structure for subtree size and use to guide the analysis #740
- similar validity issues as simple compare ordering (subtree-dependent)
[x] Perform tree zone duplication as a CST for zones we can't preserve
- DeclareTreeZoneCommand plus DuplicateZoneCommand becomes a DeclareFreeZoneCommand
[x] Merging of free zones #721 as a CST
- Could also do at start, to optimise analysis in Tree Zone steps
- Implement by searching for JoinZoneCommands, free zones to free zone, merging and fixing up the terminii
- Keep going until all joins include a tree zone or root (invariant)
[x] Drop the remaining DuplicateZoneCommands
- This is the money step where we decide to avoid duplicating everything
- But we'll end up with tree zones going into Joins, which can't execute
[x] Action and remove the MarkBaseForEmbeddedCommands.
[x] Tree zone inversion: declare a new tree zone corresponding each remaining free zones
- Need to use the joins on tree zones/root to find the XLinks for the corresponding tree zones
- Consume each JoinZoneCommand that is used for this
- Well-defined thanks to merging step's invariant
[x] Drop DeclareTreeZoneCommand that now have no references
- But what about tree zones with joins to other tree zones? Does this happen? Should we have merged them?

And then:

[ ] Review the GreenGrass situation #724

jgraley commented 1 month ago

Thoughts after a year and a half:

The old idea was to present the replace actions as a meta-program, with an optimiser that opportunistically switches copies of stuff-matched zones into simply leaving them in place.

It appears that this process should be plannable, but a planning requirement is restrictive when just trying to get something to work. Previous successes have come from implementing without a plan and then peeling the plan off the front opportunistically.

The meta-program probably wants to begin tree-like, since it's generated by a tree walk (the legacy replace phase) and if executed without optimisation it will perform essentially the same walk that created, reproducing legacy replace behaviour. So this suggests a tree-like form even though what I've implemented is more like machine code with a stack/registers. On the other hand, once the optimisations are complete, we'll have a number of independent operations to perform on disconnected segments of the tree, potentially in any order (parallel programming construct?).

I think expressions within the code can represent tree-like structure directly, and can remove the need for optimisers to have to reason about what would happen with the stack/registers at execute-time. But if the meta-program was just an expression, we would not be able to break it up into independent operations. So the meta-program should be statements with expression operands.

Due to the objective of in-place update, the underlying semantics should be by-reference, by-sharing etc with copies being requested explicitly.

It will begin as a single statement eg "overwrite root with copy of" and a single operand which is the entire initial metaprogram as an expression. It would be similar to container building expressions eg in python [2, {'a':6}, [7, 8]] i.e. the structure of the expression nodes matches the structure it evaluates to. If we could lock that in, it would make analysis easier for the optimiser.

After optimisations, I would expect a number of statements along the lines of "overwrite this tree zone with copy of" and "delete this tree zone".

I now think:

Implement without planning first
Meta-program should be command-based but with expression operands, and no stack/registers

From re-reading the 2023 plan It looks like early transformations benefit from expression tree format. I think that some kind for structure representation is required as long as there are free zones, so definitely before merging and probably after (when free zones don't touch one another). After inversion, we have a tree zone for each free zone, and I think this is the time to split into separate commands like "overwrite tree zone with copy of free zone".

jgraley commented 1 month ago

General observations

The Updater stuff is ugly but necessary. But the code that uses them to do operations on zones should be in the Zone classes not in the execute methods of eg DuplicateZoneCommand, JoinZoneCommand and even ModifyTreeCommand.
- Use teeing: command name matches method name on the zone, args are consistent etc
I seem to remember that we need to get MarkBaseForEmbeddedCommand etc out of the way somehow

jgraley / inferno-cpp2v

Tree update general plan #723