Open utterances-bot opened 4 months ago
Very nice write up! Looking forward for the next posts in the series. I'm very much interested in learning how others approach evaluating the outputs of LLMs, specially in use cases like classifying texts or extracting structured data.
Thanks @saeedesmaili! I'll be posting about the actual finetuning next.
Alex Strick van Linschoten - Evaluating the Baseline Performance of GPT-4-Turbo for Structured Data Extraction
I evaluated the baseline performance of OpenAI’s GPT-4-Turbo on the ISAF Press Release dataset.
https://mlops.systems/posts/2024-06-03-isafpr-evaluating-baseline.html