Multiary Finalization - Githubissues

multiary is nearly complete, featuring a replacement for much of the behavior currently exhibited by add and multiply, allowing those operators to be rewritten in a much cleaner fashion. There exist a number of tasks which need to be completed to finish this closure generator:

[ ] Figure out a consistent algorithm for addition factorization: the consider operations for add are notionally used to reduce repeated sub-expressions into easier to compute representations. For the most part, this is intended to handle term combination (e.g. x + 2*x => 3*x); however, this could also theoretically handle situations like x*y + x + y, which has multiple different results.
[x] A MultiaryNode should have its operands array sorted, with a closure-defined sort order. The sorting algorithm used should be based off of the degree of the operands. Naively, add should sort by degree descending (so, e.g., resulting in 2 + x + x^2 being sorted as x^2 + x + 2); multiply should sort ascending (x^2 * y * 2 would yield 2 * y * x^2). More analysis likely needed. In any event, the as-of-yet undeleted old client code has an implementation of degree which might be easily adapted to this end. A singular sort of the operands can occur right at the end of the operation, thus significantly cutting out redundant behavior.
[ ] closures/multiary.ts is very messy, as it contains all code related to it in a singular file. While most of the contents can stay in that file, the reimplementations of add and multiply need to be moved to their respective definitions in arithmetic. Furthermore, parts of multiary are somewhat sloppily defined after they are used, which should be addressed.
[x] multiary needs utility HOFs which generate unary and binary closures, similarly to binary. These would be named something like unaryFrom and binaryFrom. Intended use would be to provide the ability to generate a derived function off a multiary. So, something like: const subtract = binaryFrom(add).
[ ] Many of the test suites are likely to fail after add and multiply are properly reimplemented. Mostly, this will be due to failing snapshot tests on the writer logs. multiary logs will be incompatible with those produced by binary, which is intentional. However, some tests may fail because of different results---results which should, in theory, be semantically equivalent, but just expressed slightly differently. Many tests will likely need to be rewritten to address this. The snapshots will need review, but likely can be safely updated. Extra tests will likely need to be added to look at multiary's unique behavior.

Unless something untoward manifests during the completion of these tasks, multiary should be complete afterwards.

A problem exists with the current implementation of multiary: the 'walk/cleave' subprocess, by design, yeets found multiary nodes it finds during the walk phase in favor of their children (i.e. the cleave phase). This effectively tosses out the log information stored within the discovered child multiary nodes, which means that the history of how those nodes came to be is lost. This wouldn't necessarily be an issue, per se, but for the curation process: most functions in the calcula will curate the logs of the children they handle so as to minimize redundantly stored information.

That latter process generates the issue: for a nested multiary to exist, it would have had to have been processed already. This processing would, by its very nature, result in the curation of the grandchildren, shifting their logs into the nested multiary's. Which implies, by the time the multiary that instigated the walk/cleave gets those grandchildren for its operand list, that those grandchildren effectively no longer have history.

That history could be restored: walk/cleave could attempt to retrieve the appropriate log information from the multiary nodes being cloven and rewrap the children with that retrieved log information.

New problem: a Writer object's log field is, effectively, an Operation array; for the purposes of how the writer monad is implemented, Operation from its perspective is generic. However, for most of the rest of the library, Operation is separately defined as the following interface:

export interface Operation {
  particles: Particle[]
  action: string
}

That is, the rest of the app, when working with a Writer<TreeNode, Operation>, will be dealing with objects that meet the contract defined above. This implies that logs are not only a linear list--from the perspective of Writer--but are limited to being nothing more than a linear list--from the perspective of everything else.

Operation, defined as such, would thus make log retrieval much more complicated: there wouldn't be an easy to discern way to track where a node's directly generated logs were in its log separate from logs inserted from the children. (By way of consideration, look at the following excerpt from the processLogs function---wildly used by different closure-generating HOFs for their logs; e is the currently processed child node, while next is a potential sibling node.)

const currentOp = next
  ? /* ... */
  : operation(
      toParticles(...haveProcessed, processed(current), ...toProcess), 
      `processed ${parameterName(i, n)}`
    )
return [...e.log, currentOp]

Operation, therefore, likely needs to change: by allowing it to store either its current interface or a list of itself, any node which needed log information stored for children nodes could store that information as a sub-array. This might look something like the following:

export type Operation = {
  particles: Particle[]
  action: string
} | Operation[]

This is not without issues: certain functions will immediately break and need updating to reflect these changes. Of immediate principle note, the context function--used to grab the particles at a given location within the log--would need to be updated to reflect this alteration. Which raises an interesting question: what does context mean for an entry that is a list? Would it be the concatenation of all nested Particle arrays? Just the last entry? A special combination of the first and last, to suggest the progress performed by the sub-process?

Introducing this extra behavior is potentially non-trivial, but likely necessary. However, doing so carries with it a happy accident: NuLog's verbosity made for lengthy logs, which could be occasionally difficult to follow; it has been desired to have a nested sub-process addendum to NuLog, so as to treat those sub-processes as sub-steps that one could look at in further detail if desired, but could ignore otherwise. Adding in the change to Operation not only allows multiary to reconstitute children logs, but would allow the frontend to nest those child processes.

The otherwise block of multiary's extensions attempts to combine similar terms within the operands array by iterating through each operand and checking if any other operand matches it, according to a set of checks that are primarily provided by a closure context. multiary currently defines a singular consideration: one to combine multiple primitives which are present with non-primitives. Collectively, this process is referred to internally as combineTerms, which gets generated from the createCombineTerms HOF after the 'consideration' block of multiary is ran.

Any of the interstitial results produced by combineTerms have newly developed history. While combineTerms might be a simple pass-through operation (i.e., no term combination has occurred), it might change the list of operands it was given. Which is fine, save for one point: the when block in which combineTerms is invoked makes assumptions about the operands which are handled by its fn parameter: notably, that the passed operand list isn't effectively mutated during the processing of fn. This allows it to (in theory) curate those operands before passing them to fn and include the uncurated set in the logs it manages. Yet! by its very behavior, combineTerms alters a copy of the operands list, which leads to a disconnect between the result of the otherwise block and the logging of that block.

A separate implementation of when, which has different expectations about what is returned to it, should be provided for the otherwise block. This was an early topical consideration when development of multiary began, but was deemed (at the time) to be an unnecessary complication. However, not doing this would result in when needing completely separate behaviors for the normal and otherwise edge cases. Which seems more complicated than having a different function to handle that edge case.

The new otherwise function should likely expect to receive an augmented Action<TreeNode> tuple: the standard Writer<TreeNode, Operation> result and action string, but also a list of the effective operands that were produced by combineTerms.

Optionally, otherwise could be defined---as it shouldn't be overridden by a closure context---with combineTerms as a parameterized HOF, which would allow it to know exactly what the combined operands list is. This would allow, perhaps, the bulk of the functionality currently defined in replaceAndSort to be moved to the otherwise block directly. Seems like a cleaner, albeit a perhaps inflexible, solution.

joshuabowers / graphca

Multiary Finalization #18