Closed billmetangmo closed 4 months ago
Use backtesting with pairwise: https://www.youtube.com/watch?v=3cDtDI2W-xA
Le code utilisé pour construire le dataset:
from datetime import datetime, timedelta, timezone
from langsmith import Client
from langsmith.beta import convert_runs_to_test
# How we are sampling runs to include in our test
end_time = datetime.now(tz=timezone.utc)
start_time = end_time - timedelta(days=14)
project_name="tchoung-te"
run_filter = f'and(gt(start_time, "{start_time.isoformat()}"), lt(end_time, "{end_time.isoformat()}"))'
# Fetch the runs we want to convert to a test
client = Client()
prod_runs = list(
client.list_runs(
project_name=project_name,
execution_order=1,
filter=run_filter,
)
)
# Name of the dataset we want to create
dataset_name = f'{project_name}-backtesting {start_time.strftime("%Y-%m-%d")}-{end_time.strftime("%Y-%m-%d")}'
# This converts the runs to a dataset + test
# It does not actually invoke your model
convert_runs_to_test(
prod_runs,
# Name of the resulting dataset
dataset_name=dataset_name,
# Whether to include the run outputs as reference/ground truth
include_outputs=True,
# Whether to include the full traces in the resulting test project
# (default is to just include the root run)
load_child_runs=True,
)
Le but de cette tâche est de mettre à jour le prompt du programme en se absant sur les mauvaises réponses du chat ainsi que les attaques dans langsmith https://www.loom.com/share/cd1dfee495814db09c313357ad2282ae