Closed whuFSN closed 3 years ago
Hello
If it would not be possible to share a pretrained model, would you please show how to modify the code for training with the weaker GPU
? Say, mine is GeForce GTX 1050 4 Gig
Thank you for sharing the code.
I was able to run exactly the same code using GeForce GTX Titan X. I think it's possible to make them work for GTX 1050 as well.
The key points are:
So if you have CUDA >= 9.2 and NVIDIA driver >= 396.26, you should be able to install Pytorch 1.7.x and run our code. If you have to train with a lower torch version you need to change a few functions in the code for compatibility.
There are a couple a ways to make the code train with a smaller memory size:
num_accumulation_steps=2
...
train_batch_size=16
The effective train batch size is num_accumulation_steps*train_batch_size
and train_batch_size
determines the GPU memory consumption. Try lower the train_batch_size
number to make it fit to 4G memory.
train_batch_size=1
, then you can switch to the BERT-base model by setting the following two parameters.
encoder_input_dim=768
...
pretrained_transformer="bert-base-uncased"
None of those minimalistic configurations worked. 4 GiB
GPU's memory is too broke .
Thank you for your hints.
Please download the checkpoints following these instructions.
Then git pull
, and test the checkpoints following the newly updated README.
Thank you for open source the related code.
I noticed that BRIGE uses BERT-large to encode both question and tables. As mentioned in the paper, "The training time of our model on an NVIDIA A100 GPU is approximately 51.5h (including intermediate results verification time)." It takes a lot of time and device cost.
Are you considering sharing the trained model weights? Looking forward to your reply.