taoyds / spider

scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
https://yale-lily.github.io/spider
Apache License 2.0
848 stars 193 forks source link

Expectation for data format in seq2seq_attention_copy #98

Open shwetha-97 opened 11 months ago

shwetha-97 commented 11 months ago

In the README for the seq2seq_attention_copy method, I was unable to understand what is the difference between the data in the folders data/datasets/data and data/datasets/data_radn_split

It is mentioned that we have to put the original data in these folders.

It seems to me that the folders data and data_randn_split have different data, else the experiments in attn_copying_tune_data_radn_split.yaml and attn_copying_tune_data.yaml would be equivalent. But how are they different? Is the original data in the spider dataset being split randomly into these 2 folders? If so, in what ratio should the split be - 50:50 or some other ratio?

As I understand from here, should the folders data and data_randn_split have their own train, dev and test json? What is the reason for having these 2 folders or 2 different kinds of data?