Investigate using signalling API to wire up cross-DNA data mirroring

pospi commented 4 years ago

At present, data that lives "between" networks (eg. Satisfaction & Fulfillment) requires a "context record" to be kept on either side of the network boundary in order for participants from both networks to have equal access to information.

You can see this implemented in our system where the satisfaction zome in the planning DNA pings the satisfaction zome in the observation DNA in order to create a duplicate of the record.

We currently only allow the creation of satisfactions & fufillments within the planning DNA space, though ideally both actions should be triggerable via the observation DNA as well. In the current implementation we can't do this, because you can't configure cyclic bridge dependencies. But it does seem like a common practise to have logic in one DNA responding to activity in another.

The signalling API may be the answer to this, provided signals emitted in one DNA are received by all connected DNAs running in the conductor; and no bridging is needed to enable this link.

If we can get this working we should remove the unnecessary bridge and provide for an ability to create fulfillments in the observation DNA as well. We should also decide on some standard message structures in order to wire up similar triggers between inter-network parts of the system.

pdaoust commented 4 years ago

@pospi is the dependency graph something you could scribble up a quick diagram of? Just so's I can get a grasp of what should be responding to what.

pospi commented 4 years ago

Consider two DNAs: one containing posts and another containing comments. In this publishing platform, posts may link to comments and comments may link to posts. But multiple separate overlapping comment DNAs may interact with the same posts DNA. On both side of the links, the DNA users want to retain a record of where the links were made to, when they were made, what access they were provided under, etc- call this payload "the XDI metadata".

Does that help to illustrate the general case at all without requiring an actual illustration? Not near a decent screen atm

pdaoust commented 4 years ago

ohhhh, okay, I think I see the landscape now. So allow me to do a bit of thinking-out-loud here... if you were modelling this in a relational DB or straight UML, you would say that comment → post (using a foreign key relationship) but post !→ comment. Nice tight DAG, no dependency loops. BUUUT with Holochain that hurts your graph traversal prospects because you can't just query the DHT for all comments whose post_id field is 123. So you need to link from posts back to the comments that are attached them. I feel like this captures half of your concern; am I right?

I sense there's also an issue of propagating changes backwards along the dependency graph -- that is, if a post is deleted, that should ripple through to all the comments DNAs and trigger a deletion of all the comments too. Is that also true?

I have some thoughts about what can already be done about the first thing, but not the second thing. Just wanna check that I'm on the right track before I barrel into them :D

pospi commented 4 years ago

That sounds accurate :)

As to propagating changes backwards, I think the signalling approach would work well for that if we can convince core to implement it. Then if there are further actions to take up the chain it just means that you continue firing off signals for upstream listeners to be triggered from.

pdaoust commented 4 years ago

Ah, glad to hear I got it right :)

So, for the first case -- making it convenient to traverse links -- how does this seem as an approach? (Forgive me if you've already explained it this way and I've forgotten -- between chat, forum, and multiple GitHub issue boards, I've kinda lost track :) )

Because comment → post, the posts DNA shouldn't know anything about commenting. Initially I was thinking that the comments DNA should specify the posts DNA as a bridge dependency, but that's only important if the comments DNA needs to get some DNA-level integrity guarantees about posts -- maybe the create_comment() function wants to call a does_this_post_actually_exist() function in the posts DNA.

Anyhow, being that comments have a m:1 relationship to posts, a comment's relationship to a post is specified in its data structure, as a 'foreign key field':

{
    "author": "Qm......",
    "content": "blah blah",
    "post_id": "Qm......"
}

The post_anchor_id is the address of a post, as reported by the posts DNA.

But that's unqueryable, so now we need a post anchor entry type in the comments DNA... it could just be a string containing the post address. That's where we hang the 'post has comment' links from.

@pospi could you fill me in on what the satisfaction zome does in planning and observation, and how they and their records are different? Is it that you want the data to be available in both places because not everybody in Observation is bridging from Planning, so it can't be assumed that they'll be able to get the information they need?

I'm also curious about what Observation does -- in my mind I always had this idea that it was merely a way of projecting raw REA data into meaningful forms, but in this diagram I see it actually holds data. I'm guessing that means it is the most basic layer, the REA part of ValueFlows, and will be there in every incarnation of a HoloREA network. Is that right?

Knowing that, does the observation DHT need to know about things that are building on top of it for any particular reason?

Okay, off my tangent. Just trying to get a better grasp of HoloREA and consider possibilities.

Back to the original subject... it would seem perfect to me to use some sort of signalling to propagate changes without introducing tight coupling. The core devs are all focused on networking and testing right now though... may be a while before we see any new features :cry:

pospi commented 4 years ago

could you fill me in on what the satisfaction zome does in planning and observation, and how they and their records are different? Is it that you want the data to be available in both places because not everybody in Observation is bridging from Planning, so it can't be assumed that they'll be able to get the information they need?

Precisely.

The difference is between "anchor" types and "anchor + metadata" types. I have pushed some material to the codebase to try to explain the patterns in more detail; I'd like to do up diagrams at some point and inject them into the document as base64 images but I think this is enough for now.

In the context of that terminology, Satisfaction records are:

indirect remote indexes between EconomicEvent (in observation) and Intent (in planning), and
indirect indexes between Commitment and Intent (both in planning)

The Satisfaction zome in planning is the "master" controller (planning drives observation currently, not the other way around). You can see where this links through the _pingback calls in the API controller. In observation, Satisfaction requests are triggered in the past tense and work to synchronise the indexes that service the "find EconomicEvents satisfying this external Intent" query functionality.

(Note that the Satisfaction API handlers are a bit outdated, if you look to the Process indexing API and handlers you can see how things look with the standardised indexing abstractions I have currently. PS these are great candidates for org.holochain.* standard zome traits & mixins.)

So in answer to:

does the observation DHT need to know about things that are building on top of it for any particular reason?

Yes. The Process links showcase how Process needs to be aware of changes in Commitment and Intent. We may also at some point want to validate that remote Commitments & Intents actually exist, in which case (as you've outlined) you need a bidirectional RPC call in order to validate presence (annotated perhaps with an org.holochain.readonly zome trait to guard against circular update loops).

will [Observation] be there in every incarnation of a HoloREA network?

I expect so. That's how it panned out in the first analysis I did, anyway. But yes, observation is the "reality" layer- the data held is EconomicEvent observations and the EconomicResource data derived from those; as well as Process records detailing the activity to be / being undertaken. All other zomes in that DNA are for the purposes of managing remote indexes (both direct and indirect); which as stated is largely about keeping shadow copies. #twohardproblemsincomputerscience

may be a while before we see any new features

That's ok. We can make do with unidirectional replication for now and I think that can take us most of the way along until we need to start considering 1:many and many:many DNA relationships.

I'd really like to draw some people into reflections on SmallTalk in designing this, because I think the message passing architecture of the language is the same pattern I'm trying to describe here, and one that has been proven to be highly robust.

Note there's some more requirements being outlined in https://github.com/valueflows/vf-apps/issues/3#issue-495208374 which detail how cascading updates need to flow through some parts of the system when managing deletions and index synchronisation.

pospi commented 4 years ago

Found some working code that uses the signalling API at https://github.com/holochain/basic-chat/blob/ca9ad7da17b1d2f9d6d9f89edb5393edf0ee03ea/dna-src/zomes/chat/code/src/lib.rs

h-REA / hREA

Investigate using signalling API to wire up cross-DNA data mirroring #57