ci: Set up baseline evals using cascaded system

homebrewltd / ichigo

Llama3.1 learns to Listen

154 stars 5 forks source link

ci: Set up baseline evals using cascaded system #38

Open 0xSage opened 1 month ago

0xSage commented 1 month ago

Problem

Systematically evaluate the performance of our multimodal model by comparing it to a baseline benchmark. Baseline is a cascaded system of Whisperspeech TTS + LLaMA3.1.

Suggestions

LM performance metrics, e.g. MMLU
Latency metrics

tikikun commented 1 month ago

maybe bach have a look? otherwise for latency we have the result here #40