Sentence Transformers Gets Stuck loading

Jaswir commented 4 months ago

System Info

Ubuntu 20.04 Python 3.8.10 Updating Nvidia Driver is not possible, have to do with Cuda 11.6 (Torch 1.13.0)

torch 1.13.0 transformers 4.38.1 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 sentence-transformers 2.7.0

Who can help?

Sentencetransformer sometimes gets stuck loading forever (runs on server). Only after rebooting the server it becomes normal again for a while.

Model: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
print(embeddings)

Updating Nvidia Driver is not possible, have to do with Cuda 11.6 (Torch 1.13.0)

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Not sure.

Use system specifications and run this multiple times?.

from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer("/data/MiniLM') embeddings = model.encode(sentences) print(embeddings)

Expected behavior

Doesn't get stuck, or gives error.

amyeroberts commented 4 months ago

cc @tomaarsen

tomaarsen commented 4 months ago

Hello!

In theory, there is not really anything in Sentence Transformers that could halt it completely. Perhaps a good idea is to add:

import faulthandler

faulthandler.dump_traceback_later(60 * 5, repeat=True)

This will print out the current stack trace every 5 minutes, even if the server is halting otherwise. That way, we can narrow down the cause.

My only theory of what it could be is that it cannot connect to Hugging Face when you are loading https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2, which may cause it to wait. That said, it should only wait for a while (perhaps 30 seconds?) before it should fail and report the failure. In that case, you might want to clone the model to your server and load it as a local model instead (by pointing to the local path), and then setting TRANSFORMERS_OFFLINE=1. This should prevent any outgoing connections to HF. (Related: Offline mode docs)

Tom Aarsen

Jaswir commented 4 months ago

@tomaarsen

Also tried TRANSFORMERS_OFFLINE=1 Same result.

For more context, the model is cloned on the server and used locally. I think that some thing is going wrong with caching after using the model a couple times cause after rebooting server it works .

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jaswir commented 3 months ago

Still curious about how to fix this

tomaarsen commented 3 months ago

I'm not sure what could be causing this. It looks like the to call is failing, i.e. it's failing to move the model to (probably) CUDA/GPU. I've never seen that before, though. I can't find much info about that kind of bug either. This is the closest: https://forums.developer.nvidia.com/t/how-do-i-debug-this-pytorch-stalls-when-moving-tensor-to-gpu/169850

Tom Aarsen

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers