stacks-network / gaia

A decentralized high-performance storage system
MIT License
763 stars 147 forks source link

GAIA Access-Control #431

Open bjorger opened 1 year ago

bjorger commented 1 year ago

Currently the Gaia Access Control for files is defined as such:

"Access control in a gaia storage hub is performed on a per-address basis. Writes to URLs /store/

/ are only allowed if the writer can demonstrate that they control that address. This is achieved via an authentication token, which is a message signed by the private-key associated with that address. The message itself is a challenge-text, returned via the /hub_info/ endpoint."

After researching and interviewing various people, we came to the conclusion, that it would be beneficial for Gaia and every party involved, that the access control system is extended in such a way, that one user can grant access to their files to other users.

Additionally it should also be possible to write data on behalf of a user.

Note: it should NOT be possible to read data, only to write data

https://github.com/amark/gun https://gun.eco/ https://ceramic.network/

LDAP type service as discussed in Slack

The process should look as follows:

  • Create / run a data base (one of the mentioned above)
  • Look up (in code) where Gaia implements the access control
  • Find a way to “overwrite” / add the data base permission layer
bjorger commented 1 year ago

Following comments are copied from Asana (written by @wwwhickup)

Below is a database schema. https://dbdiagram.io/d/6328c4e80911f91ba5e82fdd Below is a process flow diagram. https://cloud.smartdraw.com/share.aspx/?pubDocShare=84D5551C63D6E3CEDE3E09EBA2E8A35B742

I am thinking about below things.

  1. what message has to be signed
  2. The user control/permission layer should be under the Admin or Hub server.
  3. Where we will keep the database? in the same driver or specific gaia database provider?

Please check the database schema and process flow diagram. And commit your thought.

@wileyj could you look into the db schema and also the part regarding what message should be signed?

@wwwhickup feel free to comment any open question here rather then in Asana

jcnelson commented 1 year ago

Additionally it should also be possible to write data on behalf of a user.

To be clear, this is possible only if the user gives the writer permission, right? Otherwise, the writer could do things like DoS the Gaia hub by writing lots of garbage data, or corrupt user files by overwriting them with garbage.

Note: it should NOT be possible to read data, only to write data

How do you intend to enforce this when Gaia is not on the read path? If you store your data into Amazon S3 or a directory on disk that gets served by a vanilla HTTP server (both modes are supported), then the only way to prevent someone from accessing your data is to encrypt it before you upload it (which is what apps do today).

The gaia-reader service is only meant to provide a unified read interface for back-end storage services that don't permit this (e.g. like google drive or MS onedrive).

jcnelson commented 1 year ago
  1. what message has to be signed

The bearer token needs to be signed by the writer, and needs to indicate the user address it's writing on behalf of. This is indicated today by the association token. On receiving the write, the node would need to verify that the user has authorized the writer.

Where we will keep the database? in the same driver or specific gaia database provider?

If possible, I think it would be best to keep this database on the back-end storage provider like how we do with revoked auth timestamps. I'm not sure how feasible this will be, but if (1) the data schema is basically just a key/value store [1], and (2) the user is only authorizing a small number of writers (e.g. less than 100), then it might make sense to simply store the authorization state as a set of small JSON blobs in the driver.

[1] Looking at the database schema, it looks like the only relations you really need are:

If that's it, then a set of JSON blobs might be good enough, and you don't need to worry about running a separate DB process or storing authoritative state locally on the hub itself. If we can keep it so we don't need either of these things -- i.e. keep the Gaia hub stateless -- then it remains possible for users to deploy hubs in serverless environments (which are cheaper and easier to manage).

Thoughts?

bjorger commented 1 year ago

@jcnelson regarding the "writing data on behalf of another user" Yes, the user has to give permission - otherwise it should not work

also the part about not reading the data - right now we disregard this point - this is a residual from the initial planning - will edit the task accordingly.

The other suggestions seem to be well thought and we consider them.

Thanks for helping us!

friedger commented 1 year ago

@bjorger There is the ucan authentication method (https://ucan.xyz/) that could be helpful to implement permission delegation, no need for a database, only cryptographic verifications.

There is a library https://github.com/web3-storage/ucanto for that already.