microsoft / IRNet

An algorithm for cross-domain NL2SQL
MIT License
264 stars 81 forks source link

Preprocessing the spider dataset leads to different file sizes. #37

Open anshudaur opened 4 years ago

anshudaur commented 4 years ago

HI All,

I am currently trying to preprocess the provided basic spider data. After executing the data_preprocess.py and sql2SemQL.py files, the file size is 20,819KB which is less than the actual processed file (39,030KB) provided by the authors. I also get the same error message ("column * table error" ) on the column, for 72 queries.

Do you have any pointers on why do we get this error for 72 queries?

Thanks, Anshu

saxenakanishk commented 4 years ago

HI All,

I am currently trying to preprocess the provided basic spider data. After executing the data_preprocess.py and sql2SemQL.py files, the file size is 20,819KB which is less than the actual processed file (39,030KB) provided by the authors. I also get the same error message ("column * table error" ) on the column, for 72 queries.

Do you have any pointers on why do we get this error for 72 queries?

Thanks, Anshu

Hi Anshu, Could you find a solution to the problem? Even I am facing this similar issue. The preprocessed train.json file provided by the authors has some more data (labels/ keys) when compared to the train.json file which I am manually preprocessing. I also faced the same error message as yours for the 72 queries.

Let me know if anyone can help me.

Thanks, Kanishk