i-dot-ai / redbox

Bringing Generative AI to the way the Civil Service works
https://i-dot-ai.github.io/redbox/
MIT License
83 stars 27 forks source link

Bugfix/test ai #756

Closed gecBurton closed 1 month ago

gecBurton commented 1 month ago

Context

Changes proposed in this pull request

A new file text-embedding-3-large.jsonl has been generated with the following code:

import json
import os

import jsonlines
from openai import AzureOpenAI

client = AzureOpenAI(
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = "2024-02-01",
  azure_endpoint =os.getenv("AZURE_OPENAI_ENDPOINT")
)

fp = "notebooks/evaluation/data/0.2.3/embeddings"

with (
    jsonlines.open(f"{fp}/all-mpnet-base-v2.jsonl", "r") as reader, 
    jsonlines.open(f"{fp}/text-embedding-3-large.jsonl", "w") as writer
):
    for line in reader:
        chunk = json.loads(line)
        response = client.embeddings.create(
            input=chunk["text"],
            model="text-embedding-3-large"
        )
        chunk["embedding"] = response.data[0].embedding
        writer.write(json.dumps(chunk))

Guidance to review

Relevant links

Things to check