CodeS baseline - Githubissues

xlang-ai / Spider2

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Apache License 2.0

157 stars 14 forks source link

Thanks for your issue and sorry for the late reply!

Actually, CodeS already handles large schemas by splitting them into small batches no longer than 500 tokens (see here) before being fed into the schema filter.

You may encounter a misleading warning: Token indices sequence length is longer than the specified maximum sequence length for this model..., which might cause confusion. However, this only occurs when checking the length of the current batch, and it is safe. Now this warning is muted.

xlang-ai / Spider2

CodeS baseline #7