taoyds / spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
https://yale-lily.github.io/spider
Apache License 2.0
848 stars 193 forks source link

Getting an error when running evaluation.py #46

Open sythello opened 4 years ago

sythello commented 4 years ago

Hello! When I tried to run evaluation.py, it terminated half-way, throwing the following error:

Traceback (most recent call last): File "evaluation.py", line 866, in evaluate(gold, pred, db_dir, etype, kmaps) File "evaluation.py", line 546, in evaluate exec_score = eval_exec_match(db, p_str, g_str, p_sql, g_sql) File "evaluation.py", line 626, in eval_exec_match q_res = cursor.fetchall() sqlite3.OperationalError: Could not decode to UTF-8 column 'last_name' with text 'Treyes Albarrac?N'

What could be the problem? Thank you in advance! Related info: python==2.7.17 (installed via pyenv)

hclent commented 3 years ago

Hi @sythello, I'm not sure if you still have this issue, but the problem is due to the text encoding in the sqlite database. That is, I couldn't find any way to solve it in Python. I solved this problem by directly modifying the database to only use ascii characters:

user$ sqlite3 wta_1.sqlite
> UPDATE players SET last_name = 'Albarracin' WHERE first_name='Joselyn Margarita';
> UPDATE players SET first_name = 'Selin Gulseren' WHERE player_id=215238;

So this fixes the entries directly:

212305|Joselyn Margarita|Treyes Albarrac?N||19970629|ECU
—> 
212305|Joselyn Margarita|Albarracin||19970629|ECU

215238|Selin G?Lseren|Simsek|U|19990509|TUR
—> 
215238|Selin Gulseren|Simsek|U|19990509|TUR

Fixing the examples directly this way does not break the spider dev set (where this table is used) because queries involving this information (i.e. tables querying the first_name and last_name columns) never ask for specific names, but usually counting or ordering them somehow.

If you found a way to fix this Pythonically, I would love to hear it though. :) Thanks!

sythello commented 3 years ago

@hclent Thank you for the input! I haven't fixed the problem (and was only using the partial match scores); this looks very helpful!