CUDA out of memory error

lxucs / coref-hoi

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

Apache License 2.0

59 stars 19 forks source link

CUDA out of memory error #3

Closed AradAshrafi closed 3 years ago

AradAshrafi commented 4 years ago

Hi,

First, I want to thank you so much for your valuable efforts, and this perfectly comprehensible and clean code.

I do not know whether I should ask this here, but I ran into CUDA out of memory error in the evaluation phase (something like this: RuntimeError: CUDA out of memory. Tried to allocate 1.02 GiB (GPU 0; 7.93 GiB total capacity; 4.76 GiB already allocated; 948.81 MiB free; 6.23 GiB reserved in total by PyTorch).

First, I ran into this error in the training phase. I reduced the size of some parameters in the experiments.conf file, which I think would help to reduce the GPU usage and they did because I am now able to pass the training phase. However, this error appears in the evaluation phase no matter how much I decrease the parameters like span width, max_sentence_len, or the ffnn size. I wonder if you had the same problem or do you have any suggestions for me.

I am currently using GeForce GTX 1080 with 8GB memory.

Many thanks, Arad

lxucs commented 4 years ago

Hi Arad,

I haven't had this issue before. It could be caused by the long input segments in evaluation, since in evaluation the segments are not truncated (in training, long segments are truncated by max_training_sentences). You can also try to truncate the segments in evaluation and see if this works.

Liyan

AradAshrafi commented 4 years ago

Hi Liyan,

Thank you so much for your response. Could you please guide me how could I do that? I changed some parameters and this is the current line that arises that same error in the evaluate function : similarity_emb = target_emb * top_antecedent_emb

AradAshrafi commented 4 years ago

Hi again,

I wanted to mention that my issue is now resolved by using google collaboratory. So I will close this issue. I just have another brief question before closing it. In your opinion, what is the most optimum way to use the trained model for predicting coreference links in a completely new sentence, which is given as an input?

Many thanks, Arad

lxucs commented 4 years ago

By efficient way do you mean to batch the input?

AradAshrafi commented 4 years ago

Yes. Something like batched prediction in Mandar Joshi repo or the original e2e-coref repo by Kenton Lee. Because I want to add a UI to my project later and use the model to predict coreference links and show them to the user.

lxucs commented 4 years ago

I see your question. Even in Mandar Joshi's repo or Kenton Lee's repo, their batch prediction is just a for-loop over each example (see https://github.com/mandarjoshi90/coref/blob/master/predict.py#L29). It is still sequential prediction; it just avoids the overhead of model loading per example.

If you need true batch prediction, I would suggest to multi-thread the preprocessing and tensorization to get the batch input. However, the more challenging part is how to make the model accept batch input (multiple documents). To my knowledge, no recent coreference model has been made to accept batch input. I might be able to work on this in December depending on my time schedule. Right now, I think the model can handle many documents per second in sequence. Of course, you can have multiple models in parallel depending on your GPU memory usage.

Let me know if you have any more questions or comments. Thanks!