Closed fajri91 closed 9 months ago
this could be normal since the MLM scoring method masks one word at a time and then computes the logits for all the 350 tokens in one go -- this would amount to a batch size of 350 which would be super huge.. I might have to go in and find a way to create sub-batches for when this happens but I'm currently out of bandwidth. In case you'd like to take a look please feel free to make a PR!
Hi, I tried to compute this with a sentence of 350 words and got GPU OOM.
Is this case normal?