Open carcruz opened 4 weeks ago
For the LLM querying part, you might find useful this exercise I had to do recently to benchmark different models (in Python).
Now that the context window of the model is much bigger (128k tokens) so most full texts will fit in a single query, I suggest that we get rid of Langchain's magic to combine the queries and we use the OpenAI client directly. It will be easier to maintain and cheaper.
https://colab.research.google.com/drive/1X6NoawfdWpHog658NXmSaZPhqVX9ZEif?usp=sharing
Epic: ot-ai-api Refactor to Python (24.12)
Description:
This epic aims to refactor the ot-ai-api from a NodeJS-based implementation to a Python-based solution using FAST API. The project initially began as a proof-of-concept (POC) collaboration between the data and front-end teams to explore adding AI-driven features to the Open Targets UI. We aim to build upon the POC's success by improving the architecture, leveraging Python's mature ecosystem, and utilizing FASTAPI's web framework advantages to enhance performance, maintainability, and scalability. The deployment will be containerized using Docker to ensure consistency across development, testing, and production environments.
The current API has one main endpoint that provides users with a natural language summary of the target-disease evidence linked to publications. This is achieved by using LangChain and OpenAI's GPT-4 mini model, which generates a summary with the prompt: “Can you provide a concise summary about the relationship between [target] and [disease] according to this study?”. The resulting summary helps users better understand the available bibliography evidence.
Acceptance Criteria:
Features:
Path to public