hyperledger / iroha

Iroha - A simple, enterprise-grade decentralized ledger
https://wiki.hyperledger.org/display/iroha
Apache License 2.0
438 stars 280 forks source link

[suggestion] Remove redundant non-iterable queries #3671

Closed mversic closed 1 month ago

mversic commented 1 year ago

Feature request

Our QueryBox contains some superfluous queries which could be composed by using a more general query and a filter. For example, there is both FindAllAccounts and FindAccountByName where you could have combined former and filter it by name.

In general we should remove all queries that return a single result and fallback to using more general queries with filter. Iroha query would correspond to SELECT statement in the relational database and filter would correspond to WHERE clause

Motivation

It's simpler and removes redundancy

Who can help?

No response

appetrosyan commented 1 year ago

To add some clarification as per our discussion with @DCNick3.

For now, we only want to reduce the surface area of the API and reduce the number of queries. In this example, we want to replace FindAccountByName with a FindAllAccounts query, combined with a StringPredicate::is(name_of_account) to get the same result.

Due to part of the context only being known at runtime, the latter operation is likely going to result in an $O(n)$ lookup, in contrast to the $O(1)$ for FindAccountByName. We can ooptionally counteract this problem by doing a short-circuiting heuristic.

So I'd do things in three steps.

Step 1: Enumerate replacement queries

This is just adding a new file to client/tests that runs through the Rust API queries and compares the results on a synthetic wsv. We need to ensure that the behaviours are identical, which can be false due to bugs in either the predicate or the sending/receiving logic.

Step 2: Identify the short-circuiting heuristics

It might be as simple as saying that any StringPredicate::is applied to an identifiable will result in either 0 (not found), or at most 1 element in a vec, because identifiers are unique per class of object and our queries return homogeneous data.

As long as we find a neat way to execute these heuristics on the backend generically. This should prevent a redundant clone after #3621 is merged, and potentially work better if we replace hash tables with prefix tries (no that's not a spelling mistake).

Step 3: Implement new logic with heuristics

Step 4: Deprecate old queries

At this point we might want to consider adding new predicates to express the missing information. In principle we should be able to bridge the gap between what Iroha can do now and what the GraphQl API should be post-launch (#1455).

As this is a complex task, I'd recommend that @DCNick3 take up another smaller task until this one is done.

mversic commented 1 year ago

Due to part of the context only being known at runtime, the latter operation is likely going to result in an lookup, in contrast to the for FindAccountByName. We can ooptionally counteract this problem by doing a short-circuiting heuristic.

both are O(n). I think you're conflating Name with AccountId. There is no downside to removing FindAccountByName and FindAssetsByName(FindAssetsByName)

DCNick3 commented 6 months ago

I am curious what would be potential implications for the permissioned use-case.

A single-purpose queries are simple to audit in the executor. See FindAssetsByDomainId for example: it's easy to forbid reading the list of domains to the outside party, but if we convert it to a FindAllAssets + domain filter, the executor then would have to analyze the filter to do the same.

Do we want to support this use-case?

Erigara commented 6 months ago

One possible solution might be to check not query itself, but query output per value to determine if seeing this value is allowed. However it might have big performance hit.

Concern with permissioned queries which bothers me is that currently there is no restrictions to subscribe to blocks stream, so anyone can recreate state and get everything it needs. Another problem is that queries (not inside smart-contracts) aren't consensus critical so a node can easily cheat and don't respect query permissions.

DCNick3 commented 6 months ago

Concern with permissioned queries which bothers me is that currently there is no restrictions to subscribe to blocks stream

Huh, this is quite a huge loophole... Should just abandon the goal of making the queries permissioned and admit that all the state is effectively public? Or do we still want to pursue it, even though it is cheatable with a rogue node?

Erigara commented 6 months ago

@mversic your opinion?

mversic commented 6 months ago

Huh, this is quite a huge loophole... Should just abandon the goal of making the queries permissioned and admit that all the state is effectively public? Or do we still want to pursue it, even though it is cheatable with a rogue node?

Let's disregard that there might be a rogue or cheating node. Let's assume that whoever is administrating the blockchain has only registered reliable nodes. This leaves us with 4 ways that we can leak data:

  1. block stream
  2. event stream
  3. http queries
  4. smart contract queries

IMO for 1. and 2. best we can provide is full access or no access. For 3. and 4. we can additionally provide granular access but that is best left for post 2.0 release if ever. Implementing all-or-nothing access to the ledger is something that we can do right now. Just define a new permission token for this and add executor endpoint that stream handlers can call to check whether to proceed with the request. Ofc in a public blockchain everyone (not only registered accounts) should be able to start streams or get query responses (atm, we require authority but I don't think we should)

Erigara commented 6 months ago

Ok, make sense to separate node level and client level violations.

Erigara commented 6 months ago

However we should still keep in mind that such issues are possible even in case of permissioned blockchain if multiple parties (with their own interest) run the network.

mversic commented 6 months ago

However we should still keep in mind that such issues are possible even in case of permissioned blockchain if multiple parties (with their own interest) run the network.

I don't think we can prevent information leaking in the case of malicious peers unless we're going to encode data on-chain

DCNick3 commented 5 months ago

For some of the queries (like FindAccountsByDomainId or FindAccountsWithAsset) we would need to introduce some additional filters, specific to the type of entity being filtered. In order to do that, it would be nice to have it type-checked that the specified predicate is valid for the type of query being filtered (probably by providing a trait that maps query output type to an allowed predicate type).

DCNick3 commented 1 month ago

A lot of the cases are done. Some other removals of queries are tracked in #4933.