This project is an implementation of the Seq2SQL model described in https://arxiv.org/pdf/1709.00103.pdf
Here we have also implemented the baseline sequence to sequence model
pip install -r requirements.txt
python preprocess.py
. This will create the tokenized versions of the datasetpython main.py
. This will run the baseline model followed by the target model.main.py
will take approximately 10 hours. Please make sure to use a system with a good GPU.data
and glove
directory are for the dataset and embeddingslibrary
folder contains code provided by WikiSQL to perform basic data conversions and query runningutil
directory contains files related to common functionality such as plotting graphs, loading datasets, preparing parallel datasets in-memory for fast access, creating batch sequences for models, and checking model accuracy.baseline
directory contains all code necessary for the baseline to runseq2sql
directory contains all code pertaining to the target model saved_model
directory is where the target model will save the best model after trainingThe entry point to the project is the main.py
file. From here it is possible to control which model(s) we want to run. The preprocess.py
is another essential file as it results in the generation of the tokenized dataset. Altering the tokenizing logic could significantly impact the results. constants.py
contains multiple parameters used by the target model like batch size, learning rate, number of epochs, etc.
Upon completion of the run, the code will generate loss graphs and store the results of the target model into a text file in the root directory of the project