Closed aryehgigi closed 1 year ago
Hello @aryehgigi did you tried to lower the batch size? maybe the error you're facing is #threads * batch_size
in each thread
do you mean in each process? I can decrease the batch size but:
wdyt?
Usually, this is how batches behave, for instance when you train a new model, you set up a batch size, and sometimes you get OOM, then you are optimizing the code to handle it.
For your use case, the batch size should be a few paragraphs, but then, you probably open a few threads in the same resource so we can't control such a use case from the package perspective.
got it, thanks :)
I got the following exception:
RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 368596058460 bytes. Error code 12 (Cannot allocate memory)
it happens on this file:
fastcoref/coref_models/modeling_fcoref.py"
, line 190, in_calc_mention_logits
:mention_logits = joint_mention_logits + start_mention_logits.unsqueeze(-1) + end_mention_logits.unsqueeze(-2)
Unfortunately it seems that it consumes the memory and doesn't release it, so my thread hangs (even if i wrap the prediction with a try/except block).
Maybe we can add a validation before computing the scores that the lengths aren't too long, and if they do to return an error?
Thanks @shon-otmazgin , @ariecattan