huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.67k stars 26.93k forks source link

Need to reset the signal handler for ALARM after we call `resolve_trust_remote_code` #29690

Closed coldnight closed 7 months ago

coldnight commented 7 months ago

System Info

transformers==4.38.1 python==3.9

Who can help?

No response

Information

Tasks

Reproduction

Hi, we've met an interesting error while we're using the version 4.38.1, here's the part of the traceback

  File "/root/online_third_party/env/venv.2062-helm/lib/python3.9/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/root/online_third_party/env/venv.2062-helm/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/root/online_third_party/env/venv.2062-helm/lib/python3.9/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.9/http/client.py", line 1349, in getresponse
    response.begin()
  File "/usr/lib/python3.9/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.9/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/root/online_third_party/env/venv.2062-helm/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 580, in _raise_timeout_error
    raise ValueError(
ValueError: Loading this model requires you to execute custom code contained in the model repository on your local machine. Please set the option `trust_remote_code=True` to permit loading of this model.

As I checked the code which it was introduced in this commit, I see we've register a signal handler for ALARM to raise an exception when we encounter the timeout case(see https://github.com/huggingface/transformers/blob/v4.38.1/src/transformers/dynamic_module_utils.py#L595-L596). And we didn't reset the signal handler to the default after the function is ended.

Expected behavior

The signal handler shouldn't affect to other part of a system.

coldnight commented 7 months ago

The alarm should clear in a finally arm, because of this exception:

  File "/tmp/venv-helm/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 598, in resolve_trust_remote_code
    answer = input(
EOFError: EOF when reading a line

During handling of the above exception, another exception occurred:

The alarm will not be clean, so the alarm will be triggered in the future, and it will produce very confuse error.

ArthurZucker commented 7 months ago

Will have a look thanks for reporting. Could you share the reproducer as well? 🤗

coldnight commented 7 months ago

This happened after I load a tokenizer and without trust_remote_code(but the tokenizer need it). The code will fail by an exception if the STDIN has been closed, but if we handle it and then let the code continue for a while, the problem will be reproduced. I think the below codes will simply reproduce:

import time

from transformers import AutoTokenizer
from transformers.dynamic_module_utils import   TIME_OUT_REMOTE_CODE

hf_tokenizer_name = 'THUDM/chatglm2-6b'
try:
    AutoTokenizer.from_pretrained(hf_tokenizer_name, use_fast=True)
except ValueError:
    print("STDIN has closed")

print("Now the program continues")
time.sleep(TIME_OUT_REMOTE_CODE + 5)

We can save this script to a file: test.py. And then run it and close the STDIN for it:

 python test.py  0<&- 

The output:

STDIN has closed
Now the program continues
Traceback (most recent call last):
  File "/Users/wh/codes/flageval/helm/test.py", line 14, in <module>
    time.sleep(TIME_OUT_REMOTE_CODE + 5)
  File "/usr/local/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 580, in _raise_timeout_error
    raise ValueError(
ValueError: Loading this model requires you to execute custom code contained in the model repository on your local machine. Please set the option `trust_remote_code=True` to permit loading of this model.