microsoft / AzureTRE

An accelerator to help organizations build Trusted Research Environments on Azure.
https://microsoft.github.io/AzureTRE
MIT License
184 stars 141 forks source link

Request Mechanism (initial purpose: Data Access Request) #3609

Open damoodamoo opened 1 year ago

damoodamoo commented 1 year ago

Context

It's common that a Researcher needs to request access for a specific data set on which to perform their research. The way this happens across projects, organisations and industries are myriad, so a core TRE Data Access Request mechanism must provide:

User Stories

As a Researcher, I want to use the TRE Portal to create a new Data Access Request for the workspace I am working in. As a Researcher, I want to view my request, see any updates to it, and understand where in the process it currently is. As a Researcher, I want to see all existing and past Data Access Requests that pertain to the workspace I am in.

As a Data Controller / Data Manager, I want to view a list of Data Access Requests for the workspace I am looking at. As a Data Controller / Data Manager, I want to use the TRE Portal to approve / reject the request, and supply comments. As a Data Controller / Data Manager, I want to be able to send the request back to the initiator for re-work, supplying comments.

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type. As a TRE Implementer, I want to be able to trigger my own workflows via HTTP/webhook after the initial request has been submitted, so that I can use a workflow tool of my choice to build a complex business-related approval workflow. As a TRE Implementer, I want my custom workflow to be able to log progress messages back to the initial request, so the initiator can see what is happening. As a TRE Implementer, I want my custom workflow to be able to update the status of the initial request. As a TRE Implementer, I want to trigger a custom background process via HTTP/webhook to provision the actual dataset in some way, after the request has been approved.

Differences to Airlock

The airlock process is a more rigid and custom process. It contains lots of logic around reviews - creating review VMs, wiring up a Review Workspace, providing mechanisms to actually move files, generate SAS links etc. Whilst the new DAR mechanism will be informed by Airlock, and will be integrated in the UX, it will not reuse the airlock API endpoints. Efforts will be made to reduce code duplication - such as refactoring the concept of a status away from being airlock specific, and shared by all request types.

Could this be used for any type of approval process, such as requesting a VM?

Potentially. However, the primary use case is Data Access Requests, and that will be the focus in the UI. There will be a single requests service, which handles the database interactions and CRUD operations, and then it will be up to the endpoints to build upon that service to enforce any custom model structures and decision making applicable to that request type.

Managing Form Templates

Each organisation will need to be able to build and manage a number of custom form templates. These will be heavily inspired from the resource templates. In the repo, a new directory named forms will be created under ./templates. This will house any number of JSON schema documents.

Each form template will contain the following fields:

To support the new forms concept, we'll need the underlying plumbing:

Requests API Design

request model

A model to contain a fixed and flexible structure to store data for all requests

requests service

A single service to handle the CRUD operations for request models. This service will handle shared logic around creating, updating, and listing requests. The service will be intentionally 'dumb'. It will be up to the calling code to enforce any permission restrictions, data structure checks (for any custom request_data) etc.

create_request

Accept a request model.

get_request

Accept a request_id, return the object from the database.

list_requests_for_workspace

Accept a workspace_id. Return a date ordered list of all request objects matching the workspace_id.

list_my_requests

Accept an optional workspace_id. Return a date-ordered list of all request objects where the requestor.user_id == current user ID.

update_request

Accept a request_id, user, and diff object (dict) containing only the changes to make (ie. a PUT).

check_and_fire_triggers

Internal to this service. Accept the request, status and the form_template.

add_message

Accept a request_id and message (string).

Note: The methods above are very likely to change as we implement. More methods will emerge, but this should give enough of an idea of the purpose of this service.

Indicative UI Mockups

Viewing all requests within a given workspace: image

Starting a new request. The UI offers all the forms available for this workspace, of type data_access_request: image

Selecting a particular type of form allows the requestor to complete the details and submit. Following submission, the request is marked as in_review, and triggers a background workflow to collect approvals as needed: image

marrobi commented 1 year ago

Looks good. One question, typically the DAR comes before a workspace has been requested. How does this tie in with this user story?

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type.

Not sure we need per workspace forms? Is there a use case for other types of request?

damoodamoo commented 1 year ago

@marrobi - this is the main question I have too, and this design takes an opinion that a workspace needs to exist before a DAR is performed. The end of a DAR would trigger a provisioning process to move the data over into the workspace, so it would either need to exist already, or we'd a wider process to create the workspace and then trigger the data provisioning.

This requests mechanism could support a "Project initiation" style workflow too, with requests being raised at the top level of the TRE and resulting in a new workspace. That, in my view, would be a next step after this is in place, as we need to support getting data into an existing workspace either way.

I would also expect that we would want per-workspace forms. Each form would define a trigger to run when the request is approved, which might well be different between workspace types.

damoodamoo commented 1 year ago

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

marrobi commented 1 year ago

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

I wouldn't say it isn't on the roadmap, it's being requested by many users, the issue is we don't have the resource to support beyond initial implementation. This may change.

marrobi commented 9 months ago

Going to reopen, as it is a requested feature, so shows up on backlog.

marrobi commented 2 months ago

@damoodamoo I'm back in this space again. Looking at your (and team's) work here - https://github.com/SAFEHR-data/Data-Access-Request-Seedling .

The ask is for this to happen pre workspace creation, the concept of project has been discussed, but I'm seeing this as 1:1 match with "data access request". Welcome a discussion on how much of the seedling work could be reused. UI is out of scope, so the forms work would likely have to wait, but the request APIs are a good starting point.