Fix exec_result evaluation issue

xlang-ai / Spider2

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

https://spider2-sql.github.io

Apache License 2.0

157 stars 14 forks source link

Closed sfc-gh-caxu closed 1 week ago

sfc-gh-caxu commented 1 week ago

This PR fixes two problems:

The score is calculated against incorrect number of examples. Whether there's an output for an example or not, the total number of examples should be the number of test examples (260) instead of len(output_results).
This PR catches potential CSV parsing errors, which can cause the evaluation script to fail.

This PR also adds examples to .gitignore.