Different quality frameworks in teams

I noticed while reading model cards from different LLM providers that these models were evaluated with different framework and methods; therefore, it is hard to compare them apples-to-apples.

In order to compare the models fairly, I think we need to make sure we evaluate them on the same quality framework. I would first define the standard quality framework designed around my use case and then evaluate these models on it.

mlopscommunity / open-questions-ai-quality

Different quality frameworks in teams #7