Closed edloginova closed 1 year ago
Hi @edloginova, and thanks for reporting on so At first, I just give it a try to reproducing so on my side under the following environment, since hope you may find something out of as well that may address on your issue. Configuration: Ubuntu 18.04 (Linux Mint), Python 3.6.9, pip-freeze-list, NVidia-GTX-1060 (6GB)
2022-11-21 14:58:32.53 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /media/nicolay/96ed6537-b931-4f7e-8ac4-8407527ddbf9/proj/REmarker/data/models/ra-20-srubert-large-neut-nli-pretrained-3l-finetuned/ra-20-srubert-large-neut-nli-pretrained-3l]
INFO:deeppavlov.core.models.tf_model:[loading model from /media/nicolay/96ed6537-b931-4f7e-8ac4-8407527ddbf9/proj/REmarker/data/models/ra-20-srubert-large-neut-nli-pretrained-3l-finetuned/ra-20-srubert-large-neut-nli-pretrained-3l]
INFO:tensorflow:Restoring parameters from /media/nicolay/96ed6537-b931-4f7e-8ac4-8407527ddbf9/proj/REmarker/data/models/ra-20-srubert-large-neut-nli-pretrained-3l-finetuned/ra-20-srubert-large-neut-nli-pretrained-3l
100%|██████████████████████████████████████████████████████████████████████████| 1253/1253 [00:01<00:00, 1004.98opins/s]
Calculating rows count (sample [DataType.Test]): 0rows [00:00, ?rows/s]2022-11-21 14:58:38.559124: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
Calculating rows count (sample [DataType.Test]): 44rows [00:02, 16.33rows/s]
INFO:arekit.common.data.storages.base:Filling with blank rows: 44
INFO:arekit.common.data.storages.base:Completed!
sample [DataType.Test]: 100%|███████████████████████████████████████████████████████████| 44/44 [00:01<00:00, 43.48it/s]
INFO:arekit.common.data.input.writers.tsv:Saving... (44, 10): /media/nicolay/96ed6537-b931-4f7e-8ac4-8407527ddbf9/proj/REmarker/examples/args/../../_output/sample-test-0.tsv.gz
INFO:arekit.common.data.input.writers.tsv:Saving completed!
Writing output: 44rows [00:01, 35.72rows/s]
1it [00:00, 119.02it/s]
Got this result.zip
REmarker
is just an old title of the project
It seems to be Tensorflow issue and attempt to allocate memory by deeppavlov
on so.
Am I right that it attempts allocate the memory on GPU device, and amount of memory is sufficient? (6GB+)
My assumption here is that deeppavlov
tries to restore model on CPU which may take a while if the latter is actually possible
It's running on Tesla T4 with 15 109 MiB, I am afraid. I reinstalled tensorflow to match your version, but it doesn't seem to fix things. Shall I ask deeppavlov community whether it is on their side?
No, i think you should not since this is not because of an issue in their code, but more closer to something low-level, i.e. tensorflow in combination with colab. You're not the only who encountered related with it... I will take a look in a details and once find something will let my advice here on so
You may also check for gpu availability from tensorflow and nvidia-smi
to guarantee everything is ok with GPU from netebook side
Yes, I checked it with nvidia-smi
, it was there and free, and the tf
command returns True
. Thank you for quick responses! <3
Well, I would love to assist you more then, however I am lack of other solutions on so for now
That might also falls onto new cudnn
and cuda
drivers I think. I have relatively old:
NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1
cudnn
7.6.5
@edloginova, as an alternative solution I have reproduced the same but torch
/transformer
based pretrained state, created with OpenNRE framework. Once you have a time or already familiar, you may down for OpenNRE
Conider the following labels conversion rel2id
required by OpenNRE:
{"0": 0, "1": 1, "2": 2}
@edloginova, may I kindly asked you whether you finally sort it out or it is still challenging?
I'm afraid I haven't figured it out yet :( Your help would be greatly appreciated, if you have time!
Okay, thanks for letting me know! Will have a look on a spare time. By the way, I've noticed you use python 3.7
, which is according to my personal experience might be incompatible for tensorflow (backend for deeppavlov).
That was the reason I was down for 3.6.9
.
Here is the routine I am using for colab in order to switch to 3.6.9
among other alternatives:
!sudo update-alternatives --config python3
!wget https://bootstrap.pypa.io/pip/3.6/get-pip.py
!python get-pip.py
I will have a look on a spare time, and keep update once give it a try to test it.
I switched to 3.6, but afraid to report it's still the same error :(
@edloginova , please try sudo
prefixed and very-likely this should help you out:
!sudo python infer_bert.py ...
(DeepPavlov and AREkit keeps data at /root/.deeppavlov
and /root/.arekit
; Suppose to be a problem of reading AREkit
resources on my side)
Ps: wish you all the best and even greater advances in 2023 🎉🎄
@nicolay-r IT WORKS! Thank you so much :))) I should have thought of that myself... Thank you for your patience! Best wishes to you, too! You're doing amazing work :)
@edloginova , thanks for your interest and feedback on so, and kind wishes! Feel free and don't hesitate to contact me in case of other questions ✨
When running
I get
and the process freezes.
Google colab, Python 3.7, tensorflow 1.15.0, numpy 1.21.6, deeppavlov 0.11.0, arekit installed from git. Tried restarting the runtime, doesn't help.