Open liuxin99 opened 1 year ago
I'm not the author of the paper but I can provide information for the last question.
In general, GECToR will first train only classifier layers firstly, so the memory usage will be small. However, after training a few epochs, the BERT based encoder will also be trained, so the memory usage will be large.
I think the solution of above is to set smaller batch size. To find a batch size where no out-of-memory occurs, you can try some kinds of values for batch size with setting [--cold_steps_count] option to zero. This option controls how many epochs to train only classifier layers, thus setting zero means BERT based encoder will be train from first epochs.
@gotutiyan I'm aware of the issue you mentioned, but what's puzzling to me is that I set the batch size to 64 and the accumulate size to 4, after two epochs of training and unfreezing the encoder's parameters, my GPU memory usage is around 9G. However, during the later training process, GPU memory usage gradually accumulates until it hits OOM (my GPU is V100 32G). I'm not sure if there's a GPU memory leak.
@MaksTarnavskyi I am interested in your paper “Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction” and I want to reproduce your results. However, I have some questions about the experimental details of your paper.
In your paper and the GitHub repository, you did not specify the GPU configuration and the hyperparameters for each stage of training. Could you please share this information with me?
Also, I encountered a strange problem when I was training the model. In the first stage of training, the GPU memory usage was very small at first, but then it gradually increased. Even with a V100 32G GPU, I got out-of-memory errors. Do you know what might cause this problem and how to solve it?