ohmeow / blurr

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.
https://ohmeow.github.io/blurr
Apache License 2.0
289 stars 34 forks source link

Out of memory errors with Keras models #82

Open dimagalat opened 2 years ago

dimagalat commented 2 years ago

Hi, I'm trying to load a keras model saved as h5 by passing model_kwargs={"from_tf": True} to get_hf_objects. This consistently causes CUDA out of memory errors. I've tried reducing batch_size to 1, and downgraded pytorch to 1.9.0. Any ideas what could be going wrong? Thanks in advance.

Traceback (most recent call last): File "/scratch/gh47/dg5608/pretraining-benefits/src/biobart_cnn_tf.py", line 52, in <module> model = BaseModelWrapper(hf_model) File "/scratch/gh47/dg5608/pt109/lib/python3.9/site-packages/fastcore/meta.py", line 39, in __call__ res.__init__(*args,**kwargs) File "/scratch/gh47/dg5608/pt109/lib/python3.9/site-packages/blurr/text/modeling/core.py", line 58, in __init__ self.hf_model = hf_model.cuda() if torch.cuda.is_available() else hf_model File "/apps/pytorch/1.9.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 637, in cuda return self._apply(lambda t: t.cuda(device)) File "/apps/pytorch/1.9.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 530, in _apply module._apply(fn) File "/apps/pytorch/1.9.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 530, in _apply module._apply(fn) File "/apps/pytorch/1.9.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 530, in _apply module._apply(fn) [Previous line repeated 3 more times] File "/apps/pytorch/1.9.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 552, in _apply param_applied = fn(param) File "/apps/pytorch/1.9.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 637, in <lambda> return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 31.75 GiB total capacity; 909.13 MiB already allocated; 1.94 MiB free; 920.00 MiB reserved in total by PyTorch)