kwilteam / kwil-db

Kwil DB, the database for web3
https://www.kwil.com/
Other
35 stars 12 forks source link

Feature: networks that are secure by default #934

Closed brennanjl closed 1 month ago

brennanjl commented 1 month ago

A request we have received from both the idOS and Truflation teams is to add the ability to enforce read access control for view calls. We previously (last fall) did this with a mustsign modifier, but we removed this in favor of having all access control be enforced at the KGW level.

The issue with this is that it is impossible to have permissioned Kwil networks be "secure by default". Unless nodes are run behind a firewall / within a VPC, there is no way to enforce authentication for reading data from a node. This has left users in a really awkward position, where they have to run kgw in front of every node. This is made even more awkward since kgw is licensed, and thus they cannot make their nodes publicly runnable nor open source.

Therefore, we should add the ability to run kwild in a "private" mode (maybe with a --private or --secure flag). This would restrict the ability for ad-hoc SQL queries, as well as enforce authentication for view calls.

We would not add cookies / anything else that kgw supports. Users that want to have cookies and use kgw would still run kwild normally; we are purely doing this so that these users can onboard 3rd party node operators and go open source.

jchappelow commented 1 month ago

Therefore, we should add the ability to run kwild in a "private" mode (maybe with a --private or --secure flag). This would restrict the ability for ad-hoc SQL queries, as well as enforce authentication for view calls.

How do we want to achieve this last part? There is no transaction nonce to sign in a view call, so we had the replay issue to address. Unless RPC was secured via TLS or other key exchange mechanism that prevents a MiTM from capturing and resending the request, we didn't have a simple solution. We discussed including height in the signed data to at least limit the impact of a potential replay (the request + signature would be invalid after say height+N). The other idea was a more sophisticated authentication scheme whereby the RPC server would provide a random challenge to sign.

Is this easier to solve now?

we are purely doing this so that these users can onboard 3rd party node operators and go open source.

Will you expand on this?

jchappelow commented 1 month ago

I think a challenge-based authentication is the most legit approach, but it's more complexity for clients. Can be done.

Other question is what components/packages of the node source are aware of call authentication. My feeling is that the engine has no business worrying about this, and that this authentication should stop at the level of the RPC server. The schema would need to inform the RPC service though. What were you considering?

brennanjl commented 1 month ago

Other question is what components/packages of the node source are aware of call authentication. My feeling is that the engine has no business worrying about this, and that this authentication should stop at the level of the RPC server. What were you considering?

Agreed. I assumed that the user service would maintain some sort of challenge mechanism, and that the added logic here would be fully contained there.

brennanjl commented 1 month ago

we are purely doing this so that these users can onboard 3rd party node operators and go open source.

Will you expand on this?

I believe this got covered in our recent discussion, but just for clarity:

Users are struggling to onboard node operators and go open source because the core functionality of their networks necessitate private data. All node operators need to be capable to enforcing access control for reads, but currently this is only possible with kgw. In order to resolve this, we either need to make this possible in kwild, or make kgw open source.

I am partial to adding it to kwild for two reasons:

  1. It is simpler for all parties involved. Just open-sourcing kgw leaves users in an awkward place with having a 1-1 relationship between kgw instances and kwild.
  2. An additional service that makes UX smooth is not a protocol level thing, but it very valuable to a lot of applications. I still think this is a business driver we want to hang on to / are not willing to give up yet.
brennanjl commented 1 month ago

Overview

A request we have received from both the idOS and Truflation teams is to add the ability to enforce read access control for view calls. We previously (last fall) did this with a mustsign modifier, but we removed this in favor of having all access control be enforced at the KGW level.

The issue with this is that it is impossible to have permissioned Kwil networks be "private by default". Unless nodes are run behind a firewall / within a VPC, there is no way to enforce authentication for reading data from a node. This has left users in a really awkward position, where they have to run kgw in front of every node. This is made even more awkward since kgw is licensed, and thus they cannot make their nodes publicly runnable nor open source.

Therefore, we should add the ability to run kwild in a "private" mode (with a --privateflag). This would restrict the ability for ad-hoc SQL queries, as well as enforce authentication for view calls.

We would not add cookies / anything else that kgw supports. Users that want to have cookies and use kgw would still run kwild normally; we are purely doing this so that these users can onboard 3rd party node operators and go open source while still having the same network access rules apply.

Challenge-Based Authentication

To implement this, we would need a system of challenge-based authentication. We briefly had a thread about this last October while we were discussing the future of mustsign, but ended up not having a solution besides a high-level plan for bi-directional gRPC streaming to implement some sort of auth workflow.

I will loop back and edit this section with details on the challenge-based auth once we figure it out.

RPC Changes

The challenge based authentication and conditional application of it (depending on whether the node has a flag set) should be fully encapsulated within the users rpc. We shouldn't need to modify anything in txapp or the engine to support this.

If the node is using --private mode, then all calls will be authenticated, regardless of the contents of the schema. This allows us to not have to involve anything engine related. See the Kuneiform section below for an explanation of this.

kwild Configuration

kwild should have a new flag --private which allows users to enable "private mode". Private mode enforces authentication for read RPCs, as well as enforces peer filtering.

We should make modifications to peer filtering, such that the way to enable it is using this --private flag. These functionalities feel very related, and I can't see why a user would want one without the other. Thus, if a user runs with --private, then they would have an empty whitelist. This should replace the p2p.private_mode config. The p2p.whitelist_peers config should remain unchanged.

Kuneiform

No changes are needed to Kuneiform for this. The only reason I bring this up is because there was previously discussion as to whether we add mustsign back in, or if we just read the action/procedure tags (@kgw(authn='true')), or something else.

After further consideration, I don't think we need to change this. If a server is running in --private mode, then all calls should be authenticated. It doesn't need to have contextual knowledge on schema metadata. I feel this is appropriate for two reasons:

  1. Users seem to either was public or private, and not some mix of the two. Therefore, we can make their development process simpler by just not having them worry about authentication within Kuneiform.
  2. Truflation got a bit tripped up with the security ramifications for @kgw(authn='true') and how it plays into foreign schema calls. Since they simply saw @kgw(authn='true') as a way to enforce some sort of access control, they didn't immediately understand that unauthenticated procedures that then call authenticated procedures would be successful would allow bypassing of access control.

Therefore, I think it is just easier to have all public, or all private.

brennanjl commented 1 month ago

^the above will probably become a wiki of some sort, but since we are still figuring out the challenge based auth, I am putting it here for now

brennanjl commented 1 month ago

An initial idea for challenge based auth:

Each node would have an RPC endpoint challenge which returns a challenge to be signed by the client. When the endpoint is requested, the node randomly generates a unique string that should be included in the signed message. It caches that string locally, and will expire it from the cache in some amount of time (lets say 10s, but maybe this should be configurable).

The client then signs this message in a similar format to how transactions are signed, where this unique identifier is the equivalent of a transaction's nonce. The server, on receiving the signed message, would verify that the unique identifier is in its local cache. If it is found, then it deletes the identifier and lets the request through. If not, it rejects the message.

Thoughts @jchappelow @Yaiba @charithabandi ?

charithabandi commented 1 month ago

Looks good, just one thing to keep in mind is, concurrency. A client can issue concurrent requests, so the local challenge cache is somehow based on these unique request IDs rather than the client identifier?

brennanjl commented 1 month ago

I don't think the cache would be based on anything client related. It would simply keep a set of all IDs it responded with over the last x seconds, and if someone tries to authenticate, it checks that it is valid. IMO there is no reason to tie the ID to any client when it is issued.

charithabandi commented 1 month ago

To avoid MITM attacks, this should still use TLS and challenge based auth would prevent replay.

brennanjl commented 1 month ago

Yeah, I presume that everybody will use TLS for their RPCs. This is already the case

Yaiba commented 1 month ago

To be clear, with 'private mode', KGW will only work with Kwild in 'public mode', right? I guess I'll need to make this clear in KGW docs later.

charithabandi commented 1 month ago

Would this be the same private mode config that we introduced for peer filtering? but with expanded scope.

brennanjl commented 1 month ago

To be clear, with 'private mode', KGW will only work with Kwild in 'public mode', right? I guess I'll need to make this clear in KGW docs later.

Yes, correct.

jchappelow commented 1 month ago

Would this be the same private mode config that we introduced for peer filtering? but with expanded scope.

It should somehow be linked if not the same thing, since running without peer filtering means kwild opens up the data via p2p.

brennanjl commented 1 month ago

Would this be the same private mode config that we introduced for peer filtering? but with expanded scope.

I think so. What I am essentially proposing is that these features get lumped into being the same thing, since I can't see when one would be used and not the other.

Yaiba commented 1 month ago

The proposed interaction workflow looks good to me.

charithabandi commented 1 month ago

To be clear, with 'private mode', KGW will only work with Kwild in 'public mode', right? I guess I'll need to make this clear in KGW docs later.

Does that mean, the public mode and private mode is it a genesis level configuration rather than node level? We definitely don't want few nodes to be running in public and few in private.

Yaiba commented 1 month ago

To be clear, with 'private mode', KGW will only work with Kwild in 'public mode', right? I guess I'll need to make this clear in KGW docs later.

Does that mean, the public mode and private mode is it a genesis level configuration rather than node level? We definitely don't want few nodes to be running in public and few in private.

IDK, maybe it's not necessary to be a genesis level configuration as long as KGW has a way to identify the kwild's mode ? So KGW won't start if all nodes are not in public mode

brennanjl commented 1 month ago

Does that mean, the public mode and private mode is it a genesis level configuration rather than node level? We definitely don't want few nodes to be running in public and few in private.

Definitely shouldn't be a genesis level config. You bring up a very good point. Now that you mention it, I do see a scenario where somebody would run with private peers and public RPC. I would imagine a network such as idOS would run as follows:

The idOS team runs a validator, as well as several read-only nodes. They would use kgw to allow end users to have smooth UX when reading from their read-only nodes. Therefore, they would:

Does this make sense?

brennanjl commented 1 month ago

To follow up on the above, I guess there should be two flags: --chain.p2p.private-mode=true and --app.authenticate-rpcs=true, or something along those lines. Do you guys feel this is sensible?

Yaiba commented 1 month ago

To follow up on the above, I guess there should be two flags: --chain.p2p.private-mode=true and --app.authenticate-rpcs=true, or something along those lines. Do you guys feel this is sensible?

The scenario you described makes sense. Regardless of the naming, I think it's better to use separated flags.

jchappelow commented 1 month ago

Yeah, I think I agree now. There is a sensible use case for open RPC but filtered peers.