Request Mechanism (initial purpose: Data Access Request)

damoodamoo commented 1 year ago

Context

It's common that a Researcher needs to request access for a specific data set on which to perform their research. The way this happens across projects, organisations and industries are myriad, so a core TRE Data Access Request mechanism must provide:

A minimum, base experience to allow a researcher to log a request, and an authorised person to approve that request.
The ability for an organisation to supply their own request form, in JSON Forms format
Extension points to enable a customer to build their own backend workflow and trigger their own processes during and following the request.

User Stories

As a Researcher, I want to use the TRE Portal to create a new Data Access Request for the workspace I am working in. As a Researcher, I want to view my request, see any updates to it, and understand where in the process it currently is. As a Researcher, I want to see all existing and past Data Access Requests that pertain to the workspace I am in.

As a Data Controller / Data Manager, I want to view a list of Data Access Requests for the workspace I am looking at. As a Data Controller / Data Manager, I want to use the TRE Portal to approve / reject the request, and supply comments. As a Data Controller / Data Manager, I want to be able to send the request back to the initiator for re-work, supplying comments.

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type. As a TRE Implementer, I want to be able to trigger my own workflows via HTTP/webhook after the initial request has been submitted, so that I can use a workflow tool of my choice to build a complex business-related approval workflow. As a TRE Implementer, I want my custom workflow to be able to log progress messages back to the initial request, so the initiator can see what is happening. As a TRE Implementer, I want my custom workflow to be able to update the status of the initial request. As a TRE Implementer, I want to trigger a custom background process via HTTP/webhook to provision the actual dataset in some way, after the request has been approved.

Differences to Airlock

The airlock process is a more rigid and custom process. It contains lots of logic around reviews - creating review VMs, wiring up a Review Workspace, providing mechanisms to actually move files, generate SAS links etc. Whilst the new DAR mechanism will be informed by Airlock, and will be integrated in the UX, it will not reuse the airlock API endpoints. Efforts will be made to reduce code duplication - such as refactoring the concept of a status away from being airlock specific, and shared by all request types.

Could this be used for any type of approval process, such as requesting a VM?

Potentially. However, the primary use case is Data Access Requests, and that will be the focus in the UI. There will be a single requests service, which handles the database interactions and CRUD operations, and then it will be up to the endpoints to build upon that service to enforce any custom model structures and decision making applicable to that request type.

Managing Form Templates

Each organisation will need to be able to build and manage a number of custom form templates. These will be heavily inspired from the resource templates. In the repo, a new directory named forms will be created under ./templates. This will house any number of JSON schema documents.

Each form template will contain the following fields:

$schema, $id, title, description: Same as resource templates
form:
- required, properties, allOf/oneOf (etc) blocks: Same as resource templates, defining the fields
- uiSchema to order fields and provide extra UI hints
triggers: a new list block specifically for forms. These entries will be used by the API logic to trigger webhook URLs upon status changes.
- name: string - friendly name of the trigger
- status: string - if the request status equals this, fire the below URI
- URI: string - may contain sensitive data. Will not be surfaced in get requests.
formType: string - Used to lookup, for instance "all Data Access Request forms".
isGlobal: boolean - Indicates whether this form could show up anywhere in the TRE
workspaceTypes: Optional. A list of strings matching the workspace definition names (ie base). Used to recall forms only meant for a particular type of workspace.

To support the new forms concept, we'll need the underlying plumbing:

Forms cosmos collection
forms API and service:
- create / update / delete operations accessible only by TRE Admin
- get/list operations accessible to TRE Admins. If a form is scoped to a particular type of workspace, the user requesting the form will need to be authenticated to a workspace of that type.

Requests API Design

`request` model

A model to contain a fixed and flexible structure to store data for all requests

title (string)
description (string)
requestor (user object)
status (string / enum)
request_type (string / enum)
requested_when (datetime)
workspace_id (guid, optional)
messages (list)
- message (string)
- user (user object)
- message_when (datetime)
updates (list - each item capturing the diff made to the overall object)
- update (dict of fields submitted for update)
- user (user object)
- updated_when (datetime)
triggers (list - each item capturing details about a fired trigger)
- trigger_name
- status
- response
- triggered_when (datetime)
request_data (dict - acts as a flexible property bag to store any custom request data. Likely populated from form data defined in the form above)

`requests` service

A single service to handle the CRUD operations for request models. This service will handle shared logic around creating, updating, and listing requests. The service will be intentionally 'dumb'. It will be up to the calling code to enforce any permission restrictions, data structure checks (for any custom request_data) etc.

`create_request`

Accept a request model.

check_and_fire_triggers
Store the model in the database
Return it along with a unique ID.

`get_request`

Accept a request_id, return the object from the database.

`list_requests_for_workspace`

Accept a workspace_id. Return a date ordered list of all request objects matching the workspace_id.

`list_my_requests`

Accept an optional workspace_id. Return a date-ordered list of all request objects where the requestor.user_id == current user ID.

`update_request`

Accept a request_id, user, and diff object (dict) containing only the changes to make (ie. a PUT).

Get the request object via get_request
Add the diff object to the updates list in the request
check_and_fire_triggers
Merge the diff object with the request object to make the changes
Save the request model back to the database
Return the updated model

`check_and_fire_triggers`

Internal to this service. Accept the request, status and the form_template.

If the status is IN form_template.triggers:
- Send the entire request object to the URI defined as a POST
- Return success if POST succeeded

`add_message`

Accept a request_id and message (string).

Get the request object via get_request
Add the message to the messages list
Save the request model back to the database
Return the updated model

Note: The methods above are very likely to change as we implement. More methods will emerge, but this should give enough of an idea of the purpose of this service.

Indicative UI Mockups

Viewing all requests within a given workspace:

Starting a new request. The UI offers all the forms available for this workspace, of type data_access_request:

Selecting a particular type of form allows the requestor to complete the details and submit. Following submission, the request is marked as in_review, and triggers a background workflow to collect approvals as needed:

marrobi commented 1 year ago

Looks good. One question, typically the DAR comes before a workspace has been requested. How does this tie in with this user story?

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type.

Not sure we need per workspace forms? Is there a use case for other types of request?

damoodamoo commented 1 year ago

@marrobi - this is the main question I have too, and this design takes an opinion that a workspace needs to exist before a DAR is performed. The end of a DAR would trigger a provisioning process to move the data over into the workspace, so it would either need to exist already, or we'd a wider process to create the workspace and then trigger the data provisioning.

This requests mechanism could support a "Project initiation" style workflow too, with requests being raised at the top level of the TRE and resulting in a new workspace. That, in my view, would be a next step after this is in place, as we need to support getting data into an existing workspace either way.

I would also expect that we would want per-workspace forms. Each form would define a trigger to run when the request is approved, which might well be different between workspace types.

damoodamoo commented 1 year ago

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

marrobi commented 1 year ago

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

I wouldn't say it isn't on the roadmap, it's being requested by many users, the issue is we don't have the resource to support beyond initial implementation. This may change.

marrobi commented 9 months ago

Going to reopen, as it is a requested feature, so shows up on backlog.

marrobi commented 2 months ago

@damoodamoo I'm back in this space again. Looking at your (and team's) work here - https://github.com/SAFEHR-data/Data-Access-Request-Seedling .

The ask is for this to happen pre workspace creation, the concept of project has been discussed, but I'm seeing this as 1:1 match with "data access request". Welcome a discussion on how much of the seedling work could be reused. UI is out of scope, so the forms work would likely have to wait, but the request APIs are a good starting point.

microsoft / AzureTRE