opencdms-dev / pyopencdms-old

⭐🐍 pyopencdms aims to build a common Python API on top of multiple Climate Data Management Systems (CDMS) that use different underlying database engines
MIT License
4 stars 6 forks source link

RBAC for SQLAlchemy models/ORM #81

Closed isedwards closed 1 year ago

isedwards commented 1 year ago

Trusted third-party Python plugins can import the pyopencdms package and use the SQLAlchemy models to interact with the database.

We would like to provide an abstraction layer on top of the SQLAlchemy models that enforces roll based access control (RBAC).

Ideally, the abstraction layer could expose all of power of SQLAlchemy whilst controlling access. However, it would also be acceptable (and possibly desirable) to provide a simplified interface that does not allow all of the flexibility of directly using the SQLAlchemy ORM.

Tasks

isedwards commented 1 year ago

I've been looking at oso.

The quick start guide gives an example of implementing a non-database driven web application (using Flask)...

However, they also have sqlalchemy-oso (docs) which is available under APL 2.0 and looks like it can be used independently from the rest of the oso stack.

Also see this blog post for an sqlalchemy-oso intro: https://www.osohq.com/post/sqlalchemy-role-rbac-basics (April 30, 2021)

isedwards commented 1 year ago

On our current roadmap (ZenHub), we're working on the Data Ingest epic before starting the Data Entry and the Data access and retrieval epics.

All three epics require us to implement requirement 3.2.1.1 Controlled access to data and systems.

So I've added 3.2.1.1 to the Data Ingest epic, since we're tackling that one first, and noted that it is blocked by this issue to implement 'RBAC for SQLAlchemy models/ORM'.

Here are some of the requirements from 3.2.1.1 that may be implemented in this issue:

This issue requires decisions on the documentation of business rules (https://github.com/opencdms-project/project-technical-team/issues/267).

Documentation should include:

An additional outcome is that an issue will be raised for a new OpenCDMS component for managing roles and access.

isedwards commented 1 year ago

@chinedu117 and I were discussing the reason for implementing access control at the ORM level. The following justification should be included in the developer docs:

For a simple CRUD web API it may be sufficient to control access to the end-points of the API depending on the user's role.

However, the OpenCDMS API can be dynamically extended by users adding their own custom Python processes. These back-end Python processes may choose to access the data layer directly through our pyopencdms Python library rather than slowly serialising all requests for data through the Web API. In this case, the access that a process is permitted to have to the data layer should depend on the role of the user who originally executed the process via the web API.

Multiple roles may be permitted to execute the process, but with different outcomes depending on their level of access. For example, writing the results of a process to the climate database's observation table may be limited to certain roles.

By implementing RBAC to the SQLAlchemy models, we can ensure that processes (including processes developed by third-parties) only have the correct access to the data layer depending on the user who is executing the process if the process developer uses our interfaces.

This will also dramatically decrease the complexity of dealing with role-based access control when developing processes since the developer of the process need not worry about what roles may exist in the system, instead they can follow a pythonic EAFP approach of attempting to proceed as desired and catching exceptions that arise due to access control.