Open 0xSage opened 1 month ago
Systematically evaluate the performance of our multimodal model by comparing it to a baseline benchmark. Baseline is a cascaded system of Whisperspeech TTS + LLaMA3.1.
maybe bach have a look? otherwise for latency we have the result here #40
Problem
Systematically evaluate the performance of our multimodal model by comparing it to a baseline benchmark. Baseline is a cascaded system of Whisperspeech TTS + LLaMA3.1.
Suggestions