qcri / LLMeBench

Benchmarking Large Language Models
80 stars 16 forks source link

update post-processing GPT4-o AR #354

Closed AridHasan closed 1 month ago