Open jaurment opened 5 years ago
Not sure what you mean. We didn't crowdsource utterances for ATIS.
In Section 6.4 you mentioned that you did interactive Experiments on ATIS also, and I'm assuming Stage 1 for ATIS was bootstrapped using Template generated queries, but Stage 2 and Stage 3 were based on real user utterances that led to generated SQL queries that were then labeled. Correct or Incorrect?
"6.4 Simulated Interactive Experiments We conducted additional simulated interactive learning experiments using GEO880 and ATIS to better understand the behavior of our train-deploy feedback loop, the effects of our data augmentation approaches, and the annotation effort required. We randomly divide each training set into K batches and present these batches sequentially to our interactive learning algorithm. Correctness feedback is provided by comparing the result of the predicted query to the gold query, i.e., we assume that users are able to perfectly distinguish correct results from incorrect ones. Figure 3 shows accuracies on GEO880 and ATIS respectively of each batch when the model is trained on all previous batches. As in the live experiment, accuracy improves with successive.."
Those are simulated experiments. So we used randomly chose examples from the ATIS training set to simulate user questions.
Oh, ok. So just to clarify -- you started with a portion of the ATIS training data set with some questions removed and added them in varying batch sizes to retrain the model w/ the additional ATIS datapoints (that simulate user questions)?
I ask because I've done something similar with our model and am not seeing the improvements you have. Do you have any intuition on why even small batches of 50 can do such a good job of increasing the model's performance at predicting SQL or suggestions on how to redo my experiment to mimic more of what you were able to achieve?
Not sure I fully understand. You're saying that your accuracies are not improving by adding more training data?
Would you be willing to share the additional labeled utterances that you crowdsourced for the ATIS database?