h-REA / hREA

A ValueFlows / REA economic network coordination system implemented on Holochain and with supplied Javascript GraphQL libraries
https://docs.hrea.io
Other
143 stars 15 forks source link

Handle create_index! and update_index! calls properly, by not swallowing errors and giving sufficient error feedback #264

Open Connoropolous opened 2 years ago

Connoropolous commented 2 years ago

At the moment I have simply added debug calls so that we can at least see them fail, potentially, but this isn't sufficient. We need proper error handling.

https://github.com/h-REA/hREA/blob/3833a5c537321e5b06a6f6ad9b2f690c34991eee/zomes/rea_economic_event/lib/src/lib.rs#L190-L202

pospi commented 2 years ago

Debug output is a nice first pass. It would be worth thinking about #176 whilst doing this; and might also be worth holding off until the new "integrity zomes" and "coordination zomes" distinction has landed in core (since it might provide integrity guarantees that we can leverage).

I'd also be curious to explore use of post_commit to manage index updates- this would ensure that no index failures could affect the related entry being written, which seems sensible (prevents malicious users creating malformed remote DNAs to cause a connected DNA to abort; though I also wonder how much damage such a bug could do since it would only affect the person configuring it). And also noting RE #176 the possibility that we'll want to handle validation of DNA-local and remote indexes differently.

The ideal would be that updating local index zomes can be considered to always succeed; and any possibilities of failure caught at compile & bundle time (see https://github.com/holochain/holochain/issues/976).

I suspect the only general-purpose solution for index zomes in foreign Cells is to treat them as non-critical and warn of errors. Unlike simpler Holochain apps, with hREA there is a distinction between "trusted" Cells configured as dependencies in the same collaboration space; vs "untrusted" Cells in neighbouring collaboration spaces. If it is someday possible to distinguish between the two at runtime (possibly via a hard dependency on a shared "cell registration" DNA) we might treat errors in the former as critical, and the latter as non-critical.

Once we can emit signals in post_commit this might be an appropriate way to alert of any indexing errors after writing a record. I think "alerting" is possibly all we can do, given the complex nature of intersecting multi-party networks.

pospi commented 2 years ago

Made a pass at this for uniformity- all index updates and RPC calls are now logged, and treated as non-critical so as to avoid any cascading inter-cell request failures (which are kinda hard to debug). See 458daed9.

Connoropolous commented 2 years ago

That's good progress.

We could also change the response type of those functions, to include a section where those errors could be returned to the caller, and decisions made in that calling context about what, if anything, to do with that error

pospi commented 2 years ago

I think most already do this. Operations which operate on single bits of data will return any error to the caller; operations which hit multiple "things" usually have Vec<RecordAPIResult> held somewhere in the response data, and each individual record can thus be returned as a bound error.

Connoropolous commented 2 years ago

I am hearing that many fns already do that, and to my knowledge none of the ones performing create_index and update_index calls do, so that's what I'm really pointing at. Refactoring the response type and making room in the response for any prospective errors from those calls

pospi commented 2 years ago

We should revisit this after #275 since it will have a significant impact on the appropriate pattern for managing cross-zome data integrity.

pospi commented 2 years ago

Should work through a resolution for #356 before working on this issue.

pospi commented 2 years ago

Some extra code quality / cleanup to note about this, when we get to it:

pospi commented 2 years ago

Some other things to address in this refactor- following #356, there's a clearer picture.

1. The *.indexed indexes were conflicting since all index controller zomes use the same integrity zome, which was fixed in f5578341 and very hard to debug. So now for locally-adjacent zomes we have totally unnecessary duplicate indexes a few milliseconds apart.

Better would be to deal with this via validation rules. A post_commit architecture will create a tight binding against the storage of the associated hdk::Record, including its Action. We can therefore use the timestamp of the signed action as the timestamp in the creation index. Note this is specific to creation time indexes— we'll still want the ability to add indexes on arbitrary other fields & (potentially remotely) associated data.

So: we should add validation for creation indexes that the timestamp much match the associated Action timestamp. Add another validation that there can only be 1 linked creation index per record. That should take care of any duplicates. But we will also need to allow the index creation logic to error and handle those validation errors gracefully for locally-adjacent indexes, which might be triggered twice.

2. Indexing against data in remote cells will still need to pass a creation timestamp through from the remote cell. Once the HDK supports DNA dependencies we might be able to validate this with the remote equivalent of a must_get.

3. It might be worth refactoring the proc macro in hdk_semantic_indexes_zome_derive into multiple parts, so that we can define multiple indexes in the same zome. It would basically be a case of not outputting some of the global imports such that they can be declared by hand, and having multiple proc macros generate APIs for multiple record type indexes in the same zome. This would move us further away from local/remote index homogeneity, so should be treated with care to ensure that indexes can still be composed at the level of DNA configs. But since we already declare indexes as either Local/Remote, it might not change any constraints. Probably a separate issue all of its own.