Open superctj opened 1 year ago
Sorry for the late reply, we use the Viznet tables for pre-training the column encoder which can be found in this page: https://github.com/megagonlabs/sato/tree/master/table_data
Thanks a lot for your sourcing the code and your answer @jw-megagon. I was wondering did you train the model on all tables of VisNet (80000) or you used the multi-column sets only? Moreover, could you please provide the hyperparameters (--batch_size, --lr --lm, --n_epochs , --max_len , --size, --projector, --augment_op, --sample_meth, --table_order) you used in the training process?
Thanks in advance!
To obtain the training data for Viznet, I saved all the tables from the folders within the viznet_tables/webtableX/KX_multi-col
directory at https://github.com/megagonlabs/sato/tree/master/table_data. These tables were then stored in the /data/viznet/tables
folder of the project, and I also simplified their names to make them more concise. Could you please confirm if my actions were correct?
I have obtained the data/viznet/tables
through the steps mentioned above. However, during the pretrain process, the file data/viznet/test.csv
is required (line 284 in pretrain.py). Can you please tell me where I can obtain this file?
Thanks in advance! : )
Thank you for open-sourcing the code! I didn't find descriptions about pretraining datasets in the paper. Was Starmie pertained on benchmark datasets?