opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Epic: ot-ai-api Refactor to Python #3598

Open carcruz opened 3 days ago

carcruz commented 3 days ago

Epic: ot-ai-api Refactor to Python (24.12)

Description:

This epic aims to refactor the ot-ai-api from a NodeJS-based implementation to a Python-based solution using FAST API. The project initially began as a proof-of-concept (POC) collaboration between the data and front-end teams to explore adding AI-driven features to the Open Targets UI. We aim to build upon the POC's success by improving the architecture, leveraging Python's mature ecosystem, and utilizing FASTAPI's web framework advantages to enhance performance, maintainability, and scalability. The deployment will be containerized using Docker to ensure consistency across development, testing, and production environments.

The current API has one main endpoint that provides users with a natural language summary of the target-disease evidence linked to publications. This is achieved by using LangChain and OpenAI's GPT-4 mini model, which generates a summary with the prompt: “Can you provide a concise summary about the relationship between [target] and [disease] according to this study?”. The resulting summary helps users better understand the available bibliography evidence.

Acceptance Criteria:

Features:

  1. Migrate core functionality from NodeJS to Python.
  2. Integrate interactive API documentation using FASTAPI’s built-in capabilities.
  3. Containerize the application with Docker.
  4. Maintain and test the natural language summary endpoint.

Path to public

ireneisdoomed commented 3 days ago

For the LLM querying part, you might find useful this exercise I had to do recently to benchmark different models (in Python).

Now that the context window of the model is much bigger (128k tokens) so most full texts will fit in a single query, I suggest that we get rid of Langchain's magic to combine the queries and we use the OpenAI client directly. It will be easier to maintain and cheaper.

https://colab.research.google.com/drive/1X6NoawfdWpHog658NXmSaZPhqVX9ZEif?usp=sharing