salesforce / TabularSemanticParsing

Translating natural language questions to a structured query language
https://arxiv.org/abs/2012.12627
BSD 3-Clause "New" or "Revised" License
222 stars 51 forks source link

Plan to share the trained model weights? #2

Closed whuFSN closed 3 years ago

whuFSN commented 3 years ago

Thank you for open source the related code.

I noticed that BRIGE uses BERT-large to encode both question and tables. As mentioned in the paper, "The training time of our model on an NVIDIA A100 GPU is approximately 51.5h (including intermediate results verification time)." It takes a lot of time and device cost.

Are you considering sharing the trained model weights? Looking forward to your reply.

Fizmath commented 3 years ago

Hello

If it would not be possible to share a pretrained model, would you please show how to modify the code for training with the weaker GPU ? Say, mine is GeForce GTX 1050 4 Gig

Thank you for sharing the code.

todpole3 commented 3 years ago

I was able to run exactly the same code using GeForce GTX Titan X. I think it's possible to make them work for GTX 1050 as well.

The key points are:

  1. The code requires Pytorch 1.7.x, which is compatible with CUDA 9.2, 10.1, 10.2 and 11.0 (https://pytorch.org/).
  2. The compatible NVIDIA driver for each CUDA version is listed in Table 1 on this page https://docs.nvidia.com/deploy/cuda-compatibility/index.html, for example, CUDA 9.2 requires NVIDIA driver version >= 396.26.

So if you have CUDA >= 9.2 and NVIDIA driver >= 396.26, you should be able to install Pytorch 1.7.x and run our code. If you have to train with a lower torch version you need to change a few functions in the code for compatibility.

There are a couple a ways to make the code train with a smaller memory size:

  1. Use smaller training batch size. The code uses gradient accumulation and you can vary the training batch size by tuning this parameters in the config files (https://github.com/salesforce/TabularSemanticParsing/blob/main/configs/bridge/spider-bridge-bert-large.sh#L38):
    num_accumulation_steps=2
    ...
    train_batch_size=16

    The effective train batch size is num_accumulation_steps*train_batch_size and train_batch_size determines the GPU memory consumption. Try lower the train_batch_size number to make it fit to 4G memory.

  2. If it does not fit even with train_batch_size=1, then you can switch to the BERT-base model by setting the following two parameters.
    encoder_input_dim=768
    ...
    pretrained_transformer="bert-base-uncased"
Fizmath commented 3 years ago

None of those minimalistic configurations worked. 4 GiB GPU's memory is too broke .

Thank you for your hints.

todpole3 commented 3 years ago

Please download the checkpoints following these instructions.

Then git pull, and test the checkpoints following the newly updated README.