ministryofjustice / analytical-platform

Analytical Platform β€’ This repository is defined and managed in Terraform
https://docs.analytical-platform.service.justice.gov.uk
MIT License
8 stars 4 forks source link

✨ Access to Bedrock - Prison Data Science projects #4222

Closed jtattersall09403 closed 1 month ago

jtattersall09403 commented 2 months ago

Describe the feature request.

Headline: Request for the Prison Data Science team to have access to use the Claude generative AI models on the AP. Ideally Claude Sonnet and Claude Haiki; if that's not possible then the older Claude 2.1 model would be OK.

Describe the context.

Four current projects, and more in the pipeline. All use sensitive data; CDDO have agreed the use of Bedrock in the EU (Frankfurt & Paris currently, London to come) for these purposes, subject to the completion of a DPIA. For all projects, we need to be able to:

  1. Test the capabilities of the different models. We need to be able to answer questions like, 'how good each Claude model at summarising intelligence reports' (or classifying casenotes, or classifying text messages, or allocating a 'reason' tag to incident reports). This would be via jupyter notebooks on the AP, using AP data (e.g. casenotes from the nomis pipeline, or Seized Media Database data from s3).
  2. Run the models for inference in production. If the models meet our evaluation thresholds, we then want to be able to use them in production. In practice, this means things like the below (which are in order of priority):
    • Running a script on schedule using Airflow to send data to the models, receive outputs, and save those to an s3 bucket or database on AWS.
    • Calling the models via API from a deployed R Shiny application, hosted on the cloud platform
    • Calling the models via API from a Digital service, hosted on the cloud platform

Project details below:

Intelligence report summarisation: Creating summaries about individuals from across multiple intelligence reports, for use by prison intelligence officers. Intelligence message flagging: Flagging which messages in our seized media database are likely to be of most relevance to intelligence officers. Assault Reasons Tagging: Text classification - categorising incident reports according to 'reason' for assaults Keyworker performance scoring: Classifying prison keyworker casenotes against a mark scheme and providing explanations for scores

Value / Purpose

We have extensively tested methods that are possible using current infrastructure (e.g. huggingface open source models). All methods fall short of what is needed for these projects. Testing on dummy/synthetic data suggests that the Claude models will substantially outperform these simpler alternatives, unlocking a huge amount of value for our frontline users. All four projects (and others like them that we will do in future) will save a huge amount of time for frontline users, meaning they will spend less time on paperwork and more time running our prison system safely and effectively.

User Types

Data scientists in Prison Data Science

jtattersall09403 commented 2 months ago

10 users in the Prison Data Science team

jtattersall09403 commented 2 months ago

Forgot to add timelines: the Intelligence Report and Keyworker projects have deadlines in mid-May for presenting a comparison of options. If we can get access to bedrock by 8th May then we would be able to squeeze in Claude comparisons for those projects.

jtattersall09403 commented 2 months ago

Just adding that any of the EU environments are OK in terms of what we've got agreement to use. It looks like Calude 3 Sonnet and Haiku are available in Paris but not Frankfurt, so Paris would be our preference!

julialawrence commented 2 months ago

@jtattersall09403 I've raised a ticket with Modernisation Platform but as far as I am aware, the Paris region is not currently configured for use so it's a bit bigger ask than simply enabling Bedrock in Paris and a new region always needs additional architectural review. I know you're working to tight timelines and hopefully we'll know soon if provisioning Paris is realistic or not and however this develops you will still have access to the Claude models available in Frankfurt as a fallback.

jtattersall09403 commented 2 months ago

Just to add after our call just now. In priority order, we need to be able to:

  1. Run analysis manually on the AP (e.g. in Jupyterlab) using LLM models hosted in Bedrock. 8th May deadline to enable us to meet our 'mid May' deadline for analysis outputs with customers.
  2. Schedule scripts that call Bedrock LLMs, using Airflow. Would need this by mid June to start implementation.
  3. Call Bedrock LLMs in real time from AP apps deployed on the cloud platform. More flexible on timelines - end of June perhaps? Initially these apps would be:
jacobwoffenden commented 2 months ago

Amazon Bedrock is now accessible in eu-west-3 from Analytical Platform

analyticalplatform@vscode-jacobwoffenden-vs-79b6f75576-62kp8:~/workspace$ AWS_DEFAULT_REGION=eu-west-3 aws bedrock list-foundation-models | jq -r '.modelSummaries[] | .modelName'
Titan Text G1 - Lite
Titan Text G1 - Lite
Titan Text G1 - Express
Titan Text G1 - Express
Titan Multimodal Embeddings G1
Titan Multimodal Embeddings G1
Claude 3 Sonnet
Claude 3 Sonnet
Claude 3 Sonnet
Claude 3 Haiku
Claude 3 Haiku
Claude 3 Haiku
Embed English
Embed English
Embed Multilingual
Embed Multilingual
Mistral 7B Instruct
Mixtral 8x7B Instruct
Mistral Large
julialawrence commented 2 months ago

Hiya Apologies, we need to hold off rollout for a little bit. Because this is a new region, we need to bootstrap some logging and monitoring controls into it first. We should have it ready to go tomorrow though!

jtattersall09403 commented 2 months ago

Alriiiiiight! Thank you so much guys - Julia just give me an @ when it's ready :)

julialawrence commented 2 months ago

We have encountered an issue enabling SecurityHub and CloudTrail aggregation in the πŸ‡«πŸ‡· region in our data production account. The above-linked PR is for that work and we're working to troubleshoot the issue. However, we were able to successfully enable GuardDuty and we still have access to CloudTrail for the next 90 days. After a conversation with Hosting senior TA and @jtattersall09403 about potential risks, we are going to enable bedrock in Paris at risk but will keep the FR open until the environment is fully bootstrapped.

julialawrence commented 2 months ago

Amazon Bedrock is now accessible in eu-west-3 from Analytical Platform

analyticalplatform@vscode-jacobwoffenden-vs-79b6f75576-62kp8:~/workspace$ AWS_DEFAULT_REGION=eu-west-3 aws bedrock list-foundation-models | jq -r '.modelSummaries[] | .modelName'
Titan Text G1 - Lite
Titan Text G1 - Lite
Titan Text G1 - Express
Titan Text G1 - Express
Titan Multimodal Embeddings G1
Titan Multimodal Embeddings G1
Claude 3 Sonnet
Claude 3 Sonnet
Claude 3 Sonnet
Claude 3 Haiku
Claude 3 Haiku
Claude 3 Haiku
Embed English
Embed English
Embed Multilingual
Embed Multilingual
Mistral 7B Instruct
Mixtral 8x7B Instruct
Mistral Large

πŸŽ‰

julialawrence commented 2 months ago

@jtattersall09403 I've enabled you for Bedrock access via the Control Panel. As only admins can enable users, please let us know who else needs these permissions.

jtattersall09403 commented 2 months ago

This is fantastic, thank you Julia! Please can you add:

julialawrence commented 2 months ago

Done.

julialawrence commented 2 months ago

Amazon, Mistral, Cohere and Anthropic models have been enabled.

julialawrence commented 1 month ago

@jtattersall09403 In order to ease tracking for requests with different timelines, requirements for Airflow and Cloud Platform apps have been split off into separate feature requests. This Feature request now covers the initial provision of Bedrock functionality in Paris which has been fully delivered since the outstanding changes in Modernisation Platform have been done and the service caveats have been removed.