Closed sibinbh closed 1 year ago
Hi, we made some major changes in this repo. One added feature is the parameter maximal_text_length
. It allows to use truncation before the chunking process. As you mentioned, the process for longer texts requires a lot of GPU memory. Maybe setting the parameter to something like 4096
or 2048
would be the compromise between memory constraints and using the longer context.
Hi
Your code helped a lot to understand the chunking process. When i'm trying to fine tune using token length of 4000+ the model breaks with Out of memory exception. I have tried a batch size of 2 and on a larger 48GB GPU as well. I can see we are continuously pushing into GPU which causes memory exhaustion. Is there a way to better manage the memory for samples which are represented by 4000+ tokens.