Open Atlamtiz opened 1 year ago
I have the same confusion. I classified the samples in dev.json based on the criteria defined in README, but when I input my results in the evaluation.py, I found the evaluation.py will classify the gold.txt into different hardness and the classification results were slightly different from mine.
So I went to the evaluation.py and used the counting functions there instead to classify the samples
+1
👀
"I didn't find any difficulty metric in Spider, but in the latest paper, Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing, I saw that they distinguished different difficulty levels. However, there seems to be no difficulty measurement in the dataset. Why is that?"