ministryofjustice / analytical-platform

Analytical Platform • This repository is defined and managed in Terraform
https://docs.analytical-platform.service.justice.gov.uk
MIT License
12 stars 4 forks source link

✨ JupyterLab Image with GLIBC V2.32 - To Enable to Run Large Language Models Locally #3041

Closed foufoulides closed 8 months ago

foufoulides commented 9 months ago

Describe the feature request.

I am currently in the Alpha Phase (which had a total of 6 weeks and now has 4 weeks remaining) with Digital and HMPPS where I am utilizing LLMs to allow prison staff to make prisoner Casenotes more discoverable. I am currently working with dummy data that I created using the OpenAI API, but we need to test effectiveness of my work on a small number of real Casenotes, which are much messier than the dummy data. In order to eliminate any privacy concerns, I need to use LLMs locally, stored on an S3 bucket. These models are implemented in C and need GNU C Library (GLIBC) V2.32 to be read. Could I have an image of JupyterLab which has GLIBC 2.32 please and if possible with 16GB as some of the larger LLMs require that to run?

Describe the context.

I am currently in the Alpha Phase (which had a total of 6 weeks and now has 4 weeks remaining) with Digital and HMPPS where I am utilizing LLMs to allow prison staff to make prisoner Casenotes more discoverable.

Value / Purpose

To assess the effectiveness and of my work on real Casenotes (which are much messier) to see feasibility of the various approaches, which will impact our decisions for the Beta Phase of this project

User Types

There are 16 staff members working on this project from Prisons Data Science, HMPPS, and Digital.

jacobwoffenden commented 8 months ago

Removing from sprint 9 as this cannot be progressed until we explore Analytical Platform tooling images again

foufoulides commented 8 months ago

To add some more specific information on the Local LLMs I am trying to use on the AP for some small scale testing purposes: I am using LangChain, which has options to Run LLMs Locally of c-implementations of popular open source LLMs. The most applicable choice from reading the documentation in the above link seemed to be GPT4All (which has nothing to do with OpenAI) which offers several such c-implemented of popular open source LLMs. If you scroll down to the Model Explorer section of the GPT4All page, you will see the options available. They are all in .gguf format (though I looked for versions online of the same models that were .bin format) GPT4All has switched recently to only .gguf format. Both scenarios gave me the same GNU C Compiler error and wanted version 2.32. Some of the larger models require the machine to have 16GB of RAM. Ideally we want to use the smallest model that does our work satisfactory, but we are not sure what that would be yet, so if possible it would be good to have the option to run the largest models as well.

jacobwoffenden commented 8 months ago

Hi @foufoulides!

We've just released a new version of Jupyter Lab to yourself and @lucypitches

You can see it on your Control Panel as v3.6.3, and it now has 2 CPUs and 16GB of RAM

Can you deploy it and let us know how you get on with testing

Thanks,

Jacob, @julialawrence, @michaeljcollinsuk

foufoulides commented 8 months ago

Hi Jacob,

Thank you so much for this! I will check it this morning and get back to you.

Best, Chris

jacobwoffenden commented 8 months ago

Feedback from @foufoulides is that the LLM is working 🎉 Closing as complete!

lucypitches commented 8 months ago

Hello, I've just deployed the new version. However, when I try install packages to my venv (python3 -m pip install -r requirements.txt) I get the following error: /home/jovyan/prison-case-note-detectability/case-notes-detect-venv/bin/python3: No module named pip This had previously been working fine on my previous version of Jupyterlab.