DISCUSSION: Sensemaker data reactivity & developer ergonomics

Hi all, I wanted to start a discussion about a few complex and interrelated details of the Sensemaker internals (:joy:). I'll try to separate topics into different comments and link to appropriate Applet code as I go through this, in the hopes that we can more easily backlink as we talk through what's emerged & potentially log followup issues. For context the Applet's Sensemaker bindings discussed in this issue are currently encapsulated within a single component.

Some of this discussion is about possible bugs, some is opinionated design issues, but all are coming from the perspective of centering "developer ergonomics" in my experience creating an Application that works to the ideal case. That case being;

The less code, the better
Declarative bindings between the Sensemaker and UI components promoting an error-free development experience
Efficient and direct bindings between Provider and Sensemaker data; minimisation of requests to Holochain API
Familiar and approachable (to web app UI developers) coding style needed to interact with our libraries

There is also a general quirk to note about the Applet I've written. Use of the deserializeId & encodeHashToBase64 helpers is mostly a quirk of my implementation where I need to translate between hREA UUIDs and Holochain entry GUIDs, though some encodeHashToBase64 is occasionally needed in translating between parts of the Sensemaker data.

All in all, considering this pass a big success in that it's made the integration needs of the Sensemaker very clear (:

Possible Sensemaker data reactivity bugs

The current (transpiled) code for getAssessmentForResource looks like this:

async getAssessmentForResource(getAssessmentsInput) {
    const resourceAssessments = await this.service.getAssessmentsForResource(getAssessmentsInput);
    __classPrivateFieldGet(this, _SensemakerStore_resourceAssessments, "f").update(resourceAssessmentsPrev => {
        resourceAssessmentsPrev[encodeHashToBase64(getAssessmentsInput.resource_eh)] = resourceAssessments;
        return resourceAssessmentsPrev;
    });
    return resourceAssessments;
}

One issue here is that resourceAssessmentsPrev is being mutated. This means that any integration with a data reactivity system (in this case, lit-svelte-stores) results in an inability to differentiate current and previous values in element update lifecycle hooks. With reactive bindings and update checks, resourceAssessments is not deemed to be updated for any executions to getAssessmentForResource:

async updated(changedProperties: PropertyValues<this>) {
    if (changedProperties.has("resourceAssessments") &&
        changedProperties.get("resourceAssessments") !== this.resourceAssessments) {
        // Should trigger on update to `resourceAssessments` via
        // `getAssessmentsForResource()`, but does not.
    }
}

Care should also be taken to allow for recursive loading of dimension data. If I want to display data from some context that's ordering by a dimension that's an output of one of its methods (*breath* what a mouthful); then chances are good I might want to display the results of that dimension per resource so that the viewer has some idea of what the ordering is based on. So if getAssessmentForResource results in a reactive update to resourceAssessments and that update triggers another getAssessmentForResource then the update logic better make sure that the operation is idempotent and that any repeat updates for the same resource don't get triggered as a change; or it could become easy enough (and very hard to debug) for developers to spike CPU and hang the UI indefinitely.

Sensemaker method naming & data flow issues

The other issue is that a getter is triggering side-effects. I can see how this might be seen as a stylistic thing and I understand (and support) the design pattern of resourceAssessments being a lazily-loaded, sparsely populated collection of Assessment arrays. But if this is also a loader then should we call it loadAssessmentForResource? In an ideal world the way I want to bind to it in a component is going to be via its corresponding reactive property and not directly via the method return value.

There were some other unexpected dependencies between appletConfig-related methods that I found it difficult to work with; in that I found myself having to manage multiple side-effects in order to get the SensemakerStore into a consistent and ready state. Some things became apparent:

checkIfAppletConfigExists or registerApplet are the only two methods that have the side-effect of initialising appletConfig completely. Therefore these must be called in the initialisation logic (whether test harness or We applet) in order for appletConfig to be available to child components. There was a codepath in provider-app-test-harness.ts not providing for this, causing errors in child components. That component has 3 init codepaths- NH creation, NH join and detection of an already-joined NH. It becomes error-prone to have to (be aware of and) call the correct set of Sensemaker methods under each condition.

Similarly in naming suggestion above I wonder whether checkIfAppletConfigExists would be better named loadAppletConfig, making it clearer that appletConfig data requires initialisation before accessing.

I also think the other methods that manipulate appletConfig (createDimension, createMethod etc) need to throw if it's not been initially loaded yet, otherwise you can end up with partial appletConfig data and undefined variable errors in the store.

`resourceAssessments` reflections

chances are good I might want to display the results of that dimension per resource

Doing this within a component's initialisation logic currently requires a lot of code. It's not just fetchAssessments, it's also knowing which dimensions need to be fetched for each resource and how to structure that data. This is a rough draft that I'm sure can be generalisable through ResourceType.dimension_ehs; and is something that I think the Sensemaker should facilitate. I estimate doing so could reduce my boilerplate by ~70 lines of code; and I was trying to flex my functional programming muscles on this one- doing it imperatively would be a lot more verbose.

These result slices may even be achievable with simple helper functions that can be run over resourceAssessments without requiring other store observables or state. If so, we should prefer that approach.

If not, I propose there be reactive svelte/store/Readable caches structured toward simplified retrieval on a per-resource, per-dimension basis. This would allow the referenced code to be replaced by a single StoreSubscriber that loads up resource dimension data.

I think there should also be an API call for loading Assessments for multiple resources at a time. Individual calls to getAssessmentForResource are always going to require boilerplate for dealing with lists. Can we have getAssessmentForResources with multiple input resource_eh instead?

`runMethod` and output dimensions

Use of runMethod is verbose. Calling runMethod should propagate any created Assessments in the method's output dimension to any store caches linked to those dimensions / Assessments. (This is currently just resourceAssessments but could be expanded pending discussion of the above). This would remove the need to manually re-retrieve the assessment data to synchronise with the result of the (locally executed) method.

`computeContext` and dependent dimensions, possible bugs

At first I thought computeContext had similar issues with reactivity, but now I'm not so sure. One possible expectation is that the dimensions a context references in thresholds & order_by etc are updated with fresh Assessment results and propagated to the UI. Changing views should at most require a context refresh, not a context refresh and several calls to getAssessmentForResource and potentially runMethod.

But, ok, let's take it from the perspective that a context only runs on the previously computed results of its dependent dimensions. One would expect that this code would be sufficient to ensure the freshest data for all economicEventHashes is displayed in a cultural context:

const runMethod = (method: Uint8Array, dim: Uint8Array) => (accum: Promise<any>, eventHash: Uint8Array) => accum.then(() => this.sensemakerStore.runMethod({
  resource_eh: eventHash,
  method_eh: method,
}).then(() => this.sensemakerStore.getAssessmentForResource({
  resource_eh: eventHash,
  dimension_eh: dim,
})))
await (economicEventHashes || []).reduce<Promise<any>>(runMethod(
  appletConfig.methods[`total_verify_method`],
  appletConfig.dimensions[`total_verify`],
), Promise.resolve())
await (economicEventHashes || []).reduce<Promise<any>>(runMethod(
  appletConfig.methods[`total_followup_method`],
  appletConfig.dimensions[`total_followup`],
), Promise.resolve())

await this.sensemakerStore.computeContext(c, contextResultInput)

I run the "total" methods over every resource, then I fetch all resulting Assessments (which shouldn't be necessary, but is a workaround for now to ensure that resourceAssessments is up to date), and then I finally compute the specified view context now that all its dependent dimension data is up to date.

But the results are less than satisfying, with inconsistent state and records not appearing in the UI where they should be. The behaviour feels particularly flaky and makes me wonder whether there could be more bugs in the store updates that are wiping previously cached data.

Sensemaker component developer ergonomics

It'd be nice if the SensemakerStore itself could implement svelte/store/Readable such that any update to its reactive sub-properties (resourceAssessments, appletConfig & contextResults) triggered an update on the store itself, emitting its changed values. I would then only need one StoreSubscriber rather than all of this.

It does seem like even most basic components are going to want to bind to more than one stream of the store's updates.

There are also a few somewhat verbose types in my Element classes still which should be wrapped up into sensemaker-lite-types and exported as per the others in that file.

Other general reflections

Things I like about the Timetracker Applet's Sensemaker integration so far:

The code writes more similarly to the code for dealing with a GraphQL query or a Svelte store; making integration more accessible by conforming to similar patterns.
The separation of createAssessment and runMethod, allowing applications to determine whether updates to dependent data are required or not in different social contexts.
The loading of per-resource Sensemaker data is now bound to updates of Provider data, rather than view changes. This means the view can switch whilst efficiently re-using data it already has rather than re-requesting it.

Things that need to be changed:

Currently, the data is driving the loading of the context & its output dimensions. Instead the active context needs to drive the loading of the data, which will be a bit of a dance (first all hashes would need to be loaded from the Provider, then the context recomputed for all of them, then the actual records loaded from the Provider based on presence in the generated context).

The challenge here will be initialising contexts, which has to be done upon first encountering every resource to see if they belong in the context or not (whether an existing resource or a newly created one). (Flagging possible DHT bloat issues with this, it's probably unadvisable to be recomputing assessments when input data has not changed.)

You will notice that I'm only calling runMethod once, immediately after the creation of a new Assessment. This means that totalling isn't computed for a resource until it's rated; ergo unrated resources are invisible to the Sensemaker and don't appear in any contexts.

I'm uncertain how to deal with this due to above uncertainty about context computation.

In addition to our current plan of action for development, we should also reflect on what was decided not to do. *(CC @emalinus)

Realtime `Dimensions` (or not)

Enabling Applet configuration flags for Dimensions was discussed that would enable flagging them as "realtime" or "auto_recompute"; but the consensus seemed to be that we should see affording the possibility of "realtime by default" configurations as an anti-pattern in a decentralised* group-sensemaking context.
*distinct from distributed; there is the sharing of power and trust inherent here as well.

The Sensemaker should be lazy-by-design to minimise unnecessary draw on CPU resources, to ensure consistent UI state between Applets and to minimise burden on peers in future multi-agent iterations.

"Stale contexts" and realtime UI features

"Stale contexts" (see linked issue) are the mechanism for making this behaviour operate in realtime (or not). runMethod has output Assessment(s) to add to some Dimension(s) anyway, so those can and should be always-realtime since they're all being executed by the local agent. Therefore we don't need to keep track of stale Dimensions.

computeContext is a more complicated story. Calling all linked runMethods with output Dimensions inferred upon by the Context so that subsequent output Assessments in their relevant Dimensions are up-to-date can be super expensive on large or largely-linked datasets (and is potentially super noisy in a multi-agent context). As a simple optimisation we can start by propagating a 'stale' flag for each Context affected by a locally-executed runMethod into the SensemakerStore observable data.

This allows the UI to choose what to do with its view Contexts. In the case of my Timetracker Applet it will probably respond seamlessly and trigger a computeContext whenever 'stale' changes; so that the "verified", "flagged" totals are always up to date for display in the list navigation panel. Other apps which don't need to display totals in an overview UI could choose to only respond to 'stale' data for the active context, and skip recomputing any others until they're activated later.

In general an approach of minimising the need for (re-)computation as much as possible seems like a noble goal to aim for.

"latest timestamp" information for `Assessments`, `Dimensions` and `Contexts`

As a separate-but-interrelated issue, providing and operating on timestamp data can solve many of these other problems which are mostly matters of cache invalidation & data replication. (...and there are two hard problems in computer science.)

If fresh Assessment data propagating to the UI also caused "last updated" to change for associated Dimensions and Resources, this could be compared against the "last updated" time of a Context in order to determine whether any recomputation of the Context is necessary after the execution of an associated Method. The UI could thus determine intelligently whether it even needs to try refreshing.

And further- presuming that with the current architecture the linked Dimensions from any specific Applet Context might be shared by other Applet Contexts; and foreign calls to computeContext may result in one being updated out from under you..: the outdated Context can be notified of the new Assessment data and have its 'stale' flag set for the request of an update whilst still remaining fixed to its old view of the Context via timestamp comparison.

A Context could be coded to omit any Assessments, Dimensions etc that are newer than its own "last updated" timestamp. The other benefit is that updating the Dimensions for the Context does not require any recomputation by runMethod and can be a simple matter of updating "last modified" on the Context so that newer already-computed Assessment data is loaded in.

In this way the UI gets to choose whether to recompute a Context if its inputs are modified in another Applet or to simply display it as stale and prompt the viewer to update. And updates made in this manner would require minimal CPU usage compared to the initial runMethod computation.

An update on architectural efficiency and our current approach WRT reactive data streams.

The goal is to expose low-overhead, reactive accessors to SensemakerService data with high specificity; such that any UI frameworks binding to data from the Sensemaker are able to do so with maximal implicit performance. The integrator should not have to consider performance matters. The thing should just be as fast as it can naturally (and legibly) be.

In the case of Shadow DOM implementations such as LitElement this means that (for example) a "thumbs up" widget can be written in such a way that it completely encapsulates its bindings to Assessments from the Sensemaker. This can be done without requiring any retrieval logic or further JavaScript I/O to Holochain and other APIs.

<thumbs-up resource=${post.id} dimension=${appletConfig.dimensions.thumbsups} /> is sufficient.

It will eventually also mean declarative controller-like CustomElements which can encapsulate and compose multiple Sensemaker coordination spaces dynamically (and natively, via async import) into a single user interface. It will also enable a lightweight event-based model for UI widgets which interact with Sensemaker data.

Super energized by this collaboration and looking forward to where it takes us :slightly_smiling_face:

Using Rxjs [marble syntax]() to describe the behaviours we are aiming for can be useful:

input    -----a-------b-------c-------------R1d---------R2d-->

store cache -[a]---[ab]-----[abc]----------[abcR1d]----[abcR1dR2d]-->
(late 'all' listener)       [abc]----------[abcR1d]----[abcR1dR2d]-->

(latest assessment of R1)  --null-----------R1d------------->

input is the stream of events being loaded in from the Sensemaker Holochain Cell, one at a time.
store cache shows the contents of the data structure accumulating a history of loaded Assessment data within SensemakerStore. Basically this should emit the complete set of all Assessments (in whatever format) each time new ones are loaded.
This depiction of the store cache is sub-optimal: ideally, we would only emit once upon initial load of the input set rather than for each item in it. @pospi suspects that LitElement reactive view listeners may optimise this away into a single initial redraw, so low priority but also worth noting that we should not rely on Lit for this optimisation long-term.
A late 'all' listener bound to the SensemakerStore data after an initial loading should only receive current and future state of all Assessment data and never be obliged to recompute it from the start of time.
A filtered stream bound to the latest assessment of R1 should not trigger any updates if an Assessment for unrelated Resource R2 is loaded into the SensemakerStore.

neighbour-hoods / nh-launcher