oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.54k stars 5.2k forks source link

training lora get failed because of connection failed #6044

Open nbet opened 3 months ago

nbet commented 3 months ago

Describe the bug

when training lora, nearly completed, an error about connection happened, this lead training failed log: [root@localhost text-generation-webui]# tail -f output.log 12:43:42-063105 INFO Training path directory raw
12:43:42-066891 INFO Loaded training file: raw-1.txt
12:43:46-303921 INFO Getting model ready
12:43:46-317392 INFO Preparing for training
12:43:46-320383 INFO Creating LoRA model
12:43:46-657499 INFO Starting training
Training 'llama' model using (q, v) projections Trainable params: 16,777,216 (0.4770 %), All params: 3,517,190,144 (Model: 3,500,412,928) 12:43:46-687427 INFO Log file 'train_dataset_sample.json' created in the
'logs' directory.

Exception in thread Thread-8 (threaded_run): Traceback (most recent call last): File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request raise new_e File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request self._validate_conn(conn) File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn conn.connect() File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 653, in connect sock_and_verified = _ssl_wrap_socket_and_match_hostname( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 806, in _ssl_wrap_socket_and_match_hostname ssl_sock = ssl_wrap_socket( ^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installerfiles/env/lib/python3.11/site-packages/urllib3/util/ssl.py", line 465, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installerfiles/env/lib/python3.11/site-packages/urllib3/util/ssl.py", line 509, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 517, in wrap_socket return self.sslsocket_class._create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1104, in _create self.do_handshake() File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1382, in do_handshake self._sslobj.do_handshake() ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/adapters.py", line 589, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/retry.py", line 470, in increment raise reraise(type(error), error, _stacktrace) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/util.py", line 38, in reraise raise value.with_traceback(tb) File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request raise new_e File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request self._validate_conn(conn) File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn conn.connect() File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 653, in connect sock_and_verified = _ssl_wrap_socket_and_match_hostname( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 806, in _ssl_wrap_socket_and_match_hostname ssl_sock = ssl_wrap_socket( ^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installerfiles/env/lib/python3.11/site-packages/urllib3/util/ssl.py", line 465, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installerfiles/env/lib/python3.11/site-packages/urllib3/util/ssl.py", line 509, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 517, in wrap_socket return self.sslsocket_class._create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1104, in _create self.do_handshake() File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1382, in do_handshake self._sslobj.do_handshake() urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/text-generation-webui/installer_files/env/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() File "/root/text-generation-webui/installer_files/env/lib/python3.11/threading.py", line 982, in run self._target(*self._args, self._kwargs) File "/root/text-generation-webui/modules/training.py", line 707, in threaded_run lora_model.save_pretrained(lora_file_path) File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/peft/peft_model.py", line 216, in save_pretrained output_state_dict = get_peft_model_state_dict( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/peft/utils/save_and_load.py", line 146, in get_peft_model_state_dict has_remote_config = file_exists(model_id, "config.json") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2641, in file_exists get_hf_file_metadata(url, token=token) File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata r = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper response = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper response = get_session().request(method=method, url=url, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 66, in send return super().send(request, args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/adapters.py", line 604, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 347e60cc-7993-4980-88f7-a874636b5d1c)') Step: 159 {'loss': 2.3787, 'grad_norm': 0.5803480744361877, 'learning_rate': 0.0002926829268292683, 'epoch': 0.321285140562249} Step: 319 {'loss': 1.8832, 'grad_norm': 1.2659270763397217, 'learning_rate': 0.0002560975609756097, 'epoch': 0.642570281124498} Step: 479 {'loss': 1.3876, 'grad_norm': 0.6671197414398193, 'learning_rate': 0.0002195121951219512, 'epoch': 0.963855421686747} Step: 621 {'loss': 0.956, 'grad_norm': 0.5905065536499023, 'learning_rate': 0.00018292682926829266, 'epoch': 1.285140562248996} Step: 781 {'loss': 0.6913, 'grad_norm': 0.6068465709686279, 'learning_rate': 0.00014634146341463414, 'epoch': 1.606425702811245} Step: 941 {'loss': 0.5488, 'grad_norm': 0.4779954254627228, 'learning_rate': 0.0001097560975609756, 'epoch': 1.927710843373494} Step: 1115 {'loss': 0.441, 'grad_norm': 0.38670554757118225, 'learning_rate': 7.317073170731707e-05, 'epoch': 2.248995983935743} Step: 1275 {'loss': 0.3673, 'grad_norm': 0.40382111072540283, 'learning_rate': 3.6585365853658535e-05, 'epoch': 2.570281124497992} Step: 1435 {'loss': 0.3442, 'grad_norm': 0.37195587158203125, 'learning_rate': 0.0, 'epoch': 2.891566265060241} Step: 1435 {'train_runtime': 1404.4968, 'train_samples_per_second': 4.248, 'train_steps_per_second': 0.032, 'train_loss': 0.9997953918245104, 'epoch': 2.891566265060241} 13:07:29-198633 INFO Training complete, saving
Traceback (most recent call last): File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn sock = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) TimeoutError: timed out

The above exception was the direct cause of the following exception:

Is there an existing issue for this?

Reproduction

load llama-2-7b, and add raw data; try to start training, in the end, failed because of an connection error

Screenshot

No response

Logs

[root@localhost text-generation-webui]# tail -f output.log 
12:43:42-063105 INFO     Training path directory raw                            
12:43:42-066891 INFO     Loaded training file: raw-1.txt                        
12:43:46-303921 INFO     Getting model ready                                    
12:43:46-317392 INFO     Preparing for training                                 
12:43:46-320383 INFO     Creating LoRA model                                    
12:43:46-657499 INFO     Starting training                                      
Training 'llama' model using (q, v) projections
Trainable params: 16,777,216 (0.4770 %), All params: 3,517,190,144 (Model: 3,500,412,928)
12:43:46-687427 INFO     Log file 'train_dataset_sample.json' created in the    
                         'logs' directory.                                      

Exception in thread Thread-8 (threaded_run):
Traceback (most recent call last):
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 653, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 806, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
               ^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 465, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 509, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1382, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/adapters.py", line 589, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/retry.py", line 470, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 653, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 806, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
               ^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 465, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 509, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/ssl.py", line 1382, in do_handshake
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/root/text-generation-webui/modules/training.py", line 707, in threaded_run
    lora_model.save_pretrained(lora_file_path)
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/peft/peft_model.py", line 216, in save_pretrained
    output_state_dict = get_peft_model_state_dict(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/peft/utils/save_and_load.py", line 146, in get_peft_model_state_dict
    has_remote_config = file_exists(model_id, "config.json")
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2641, in file_exists
    get_hf_file_metadata(url, token=token)
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
        ^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
               ^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 66, in send
    return super().send(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/requests/adapters.py", line 604, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 347e60cc-7993-4980-88f7-a874636b5d1c)')
Step: 159 {'loss': 2.3787, 'grad_norm': 0.5803480744361877, 'learning_rate': 0.0002926829268292683, 'epoch': 0.321285140562249}
Step: 319 {'loss': 1.8832, 'grad_norm': 1.2659270763397217, 'learning_rate': 0.0002560975609756097, 'epoch': 0.642570281124498}
Step: 479 {'loss': 1.3876, 'grad_norm': 0.6671197414398193, 'learning_rate': 0.0002195121951219512, 'epoch': 0.963855421686747}
Step: 621 {'loss': 0.956, 'grad_norm': 0.5905065536499023, 'learning_rate': 0.00018292682926829266, 'epoch': 1.285140562248996}
Step: 781 {'loss': 0.6913, 'grad_norm': 0.6068465709686279, 'learning_rate': 0.00014634146341463414, 'epoch': 1.606425702811245}
Step: 941 {'loss': 0.5488, 'grad_norm': 0.4779954254627228, 'learning_rate': 0.0001097560975609756, 'epoch': 1.927710843373494}
Step: 1115 {'loss': 0.441, 'grad_norm': 0.38670554757118225, 'learning_rate': 7.317073170731707e-05, 'epoch': 2.248995983935743}
Step: 1275 {'loss': 0.3673, 'grad_norm': 0.40382111072540283, 'learning_rate': 3.6585365853658535e-05, 'epoch': 2.570281124497992}
Step: 1435 {'loss': 0.3442, 'grad_norm': 0.37195587158203125, 'learning_rate': 0.0, 'epoch': 2.891566265060241}
Step: 1435 {'train_runtime': 1404.4968, 'train_samples_per_second': 4.248, 'train_steps_per_second': 0.032, 'train_loss': 0.9997953918245104, 'epoch': 2.891566265060241}
13:07:29-198633 INFO     Training complete, saving                              
Traceback (most recent call last):
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/root/text-generation-webui/installer_files/env/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: timed out

The above exception was the direct cause of the following exception:

System Info

python: 3.11.9
urllib3: 2.2.1
nbet commented 3 months ago

hei, please help me, good luck