microsoft / Table-Pretraining

ICLR 2022 Paper, SOTA Table Pre-training Model, TAPEX: Table Pre-training via Learning a Neural SQL Executor
https://table-pretraining.github.io
MIT License
290 stars 38 forks source link

tapex.base model performance poorly with "run_model.py predict" #16

Closed xienian87 closed 2 years ago

xienian87 commented 2 years ago

Hello,

I follow the instruction and trained a tapex.base model on wikisql dataset. The "run_model.py eval" reports Denotation Accuracy : 0.879. However when I use ''run_model.py predict" command to perform online prediction, the result is very poor. Is there any thing I can try to sovle this problem?

Thanks

Nian

xienian87 commented 2 years ago

I tried the "run_model.py predict" command with the downloaded tapex.large model in the instruction. The result is fine.

SivilTaram commented 2 years ago

@xienian87 Thanks for your interest on our work, could you confirm that the checkpoint is successfully loaded when predicting? You may find the log and paste it here, and I will help check that.

xienian87 commented 2 years ago

@SivilTaram Hello, I figure out the problem but don't know how to solve it. I test all the saved checkpoints and find out that some of them give very close answer and some of them give completely wrong answer. None of them gives correct answer ("2004" in the demo). It seems that models are overfitted. Can you provide some suggestion on how to train the model, please. Thanks very much.

xienian87 commented 2 years ago

One thing I don't understand is that the model works fine in "run_model.py eval" command, but not in "run_model.py predict" command. If the model is overfitted, how can it works fine in eval?

xienian87 commented 2 years ago

below is the log message on terminal: 2022-08-16 11:20:24 | INFO | fairseq.file_utils | loading archive file tapex.base 2022-08-16 11:20:26 | INFO | fairseq.tasks.translation | [src] dictionary: 51200 types 2022-08-16 11:20:26 | INFO | fairseq.tasks.translation | [tgt] dictionary: 51200 types 2022-08-16 11:20:58 | INFO | main | Receive question as : Greece held its last Summer Olympics in which year? 2022-08-16 11:20:58 | INFO | main | The answer should be : 9

SivilTaram commented 2 years ago

@xienian87 If I understand it correctly, you directly use the pre-trained tapex.base on downstream questions, instead of the fine-tuned models? If the evaluating result seems promising, how about trying to predict all answers of evaluated examples, and then see if the model satisfies your expectation (but not for the demo question only). Thanks!

xienian87 commented 2 years ago

@SivilTaram I use the fine-tuned models on wikisql dataset. I will predict all answers of evaluated examples and also train a large model to check its predict performance.

SivilTaram commented 2 years ago

@xienian87 Hi, how about the quality of predictions? Is the prediction results expected?

SivilTaram commented 2 years ago

Closed since there is no more activity. Feel free to re-open when having more questions.