mim-solutions / bert_for_longer_texts

BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation allows fine-tuning.
Other
129 stars 30 forks source link

Managing GPU memory for token length more than 4000 #7

Closed sibinbh closed 1 year ago

sibinbh commented 2 years ago

Hi

Your code helped a lot to understand the chunking process. When i'm trying to fine tune using token length of 4000+ the model breaks with Out of memory exception. I have tried a batch size of 2 and on a larger 48GB GPU as well. I can see we are continuously pushing into GPU which causes memory exhaustion. Is there a way to better manage the memory for samples which are represented by 4000+ tokens.

MichalBrzozowski91 commented 1 year ago

Hi, we made some major changes in this repo. One added feature is the parameter maximal_text_length. It allows to use truncation before the chunking process. As you mentioned, the process for longer texts requires a lot of GPU memory. Maybe setting the parameter to something like 4096 or 2048 would be the compromise between memory constraints and using the longer context.