modal-labs / llm-finetuning

Guide for fine-tuning Llama/Mistral/CodeLlama models and more
MIT License
534 stars 82 forks source link

Following the quickstart example results in an error #64

Closed holma91 closed 5 months ago

holma91 commented 5 months ago

Steps to reproduce:

  1. git clone https://github.com/modal-labs/llm-finetuning.git
  2. python3 -m venv env && . env/bin/activate && pip install modal
  3. set up modal token, huggingface token and wandb token as specified in the repo
  4. accept T&C for the chosen models on huggingface
  5. run modal run --detach src.train --config=config/mistral-memorize.yml --data=data/sqlqa.subsample.jsonl

Results in:

ExecutionError: Could not deserialize remote exception due to local error:
Deserialization failed because the 'huggingface_hub' module is not available in the local environment.
This can happen if your local environment does not have the remote exception definitions.
Here is the remote traceback:
Traceback (most recent call last):
  File "/root/src/train.py", line 90, in launch
    snapshot_download(model_name, local_files_only=True)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 
118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 
235, in snapshot_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find an appropriate cached snapshot folder for 
the specified revision on the local disk and outgoing traffic has been disabled. To enable repo look-ups and 
downloads online, pass 'local_files_only=False' as input.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 286, 
in hf_raise_for_status
    response.raise_for_status()
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/requests/models.py", line 1024, in 
raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: 
https://huggingface.co/api/models/mistralai/Mistral-7B-v0.1/revision/main

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 
179, in snapshot_download
    repo_info = api.repo_info(repo_id=repo_id, repo_type=repo_type, revision=revision, token=token)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 
118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2275, in 
repo_info
    return method(
           ^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 
118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2085, in 
model_info
    hf_raise_for_status(r)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 333, 
in hf_raise_for_status
    raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: 
https://huggingface.co/api/models/mistralai/Mistral-7B-v0.1/revision/main (Request ID: 
Root=1-66643bd7-5731280b4891ad705891dd33;ed0c24cd-c1f7-4fdc-a093-c0f2075c8d39)

Authorization error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/pkg/modal/_container_io_manager.py", line 492, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 378, in run_input_sync
    res = finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/src/train.py", line 94, in launch
    snapshot_download(model_name)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 
118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 
251, in snapshot_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the files on 
the Hub and we cannot find the appropriate snapshot folder for the specified revision on the local disk. Please
check your internet connection and try again.

If using some other config than config/mistral-memorize.yml like config/llama-3.yml, everything works fine.

Don't think it should matter but I'm using Python 3.11 on a Macbook M1.

mwaskom commented 5 months ago

Hi, did you try this more than once? I just tried it and it ran fine, and the error message does suggest it might be a transient issue, perhaps on the Hugging Face side. FWIW that config just uses

base_model: mistralai/Mistral-7B-v0.1

Which AFAIK hasn't gone anywhere :)

holma91 commented 5 months ago

Tried it again now and I'm still getting it. Pretty sure my secrets are setup correctly aswell:

Screenshot 2024-06-10 at 14 19 09

Here's the complete trace:

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

Downloading mistralai/Mistral-7B-v0.1 ...
Traceback (most recent call last):
  File "/root/src/train.py", line 90, in launch
    snapshot_download(model_name, local_files_only=True)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 235, in snapshot_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find an appropriate cached snapshot folder for the specified revision on the local disk and outgoing traffic has been disabled. To enable repo look-ups and downloads online, pass 'local_files_only=False' as input.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    response.raise_for_status()
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/mistralai/Mistral-7B-v0.1/revision/main

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 179, in snapshot_download
    repo_info = api.repo_info(repo_id=repo_id, repo_type=repo_type, revision=revision, token=token)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2275, in repo_info
    return method(
           ^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2085, in model_info
    hf_raise_for_status(r)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 333, in hf_raise_for_status
    raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/mistralai/Mistral-7B-v0.1/revision/main (Request ID: Root=1-6666ee5f-399cf03367d0ba5c0195c2f8;ade0d03f-5211-4dee-b4ae-fc1a9942aaf6)

Authorization error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/pkg/modal/_container_io_manager.py", line 492, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 378, in run_input_sync
    res = finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/src/train.py", line 94, in launch
    snapshot_download(model_name)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 251, in snapshot_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the files on the Hub and we cannot find the appropriate snapshot folder for the specified revision on the local disk. Please check your internet connection and try again.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/lapuerta/dev/llm-finetuning/src/train.py:152 in main                                      │
│                                                                                                  │
│   151 │   with open(config, "r") as cfg, open(data, "r") as dat:                                 │
│ ❱ 152 │   │   run_name, launch_handle = launch.remote(                                           │
│   153 │   │   │   cfg.read(), dat.read(), run_to_resume, preproc_only                            │
│                                                                                                  │
│ /Users/lapuerta/dev/llm-finetuning/env/lib/python3.11/site-packages/modal/object.py:230 in       │
│ wrapped                                                                                          │
│                                                                                                  │
│   229 │   │   await self.resolve()                                                               │
│ ❱ 230 │   │   return await method(self, *args, **kwargs)                                         │
│   231                                                                                            │
│                                                                                                  │
│ /Users/lapuerta/dev/llm-finetuning/env/lib/python3.11/site-packages/modal/functions.py:987 in    │
│ remote                                                                                           │
│                                                                                                  │
│    986 │   │                                                                                     │
│ ❱  987 │   │   return await self._call_function(args, kwargs)                                    │
│    988                                                                                           │
│                                                                                                  │
│ /Users/lapuerta/dev/llm-finetuning/env/lib/python3.11/site-packages/modal/functions.py:949 in    │
│ _call_function                                                                                   │
│                                                                                                  │
│    948 │   │   try:                                                                              │
│ ❱  949 │   │   │   return await invocation.run_function()                                        │
│    950 │   │   except asyncio.CancelledError:                                                    │
│                                                                                                  │
│ /Users/lapuerta/dev/llm-finetuning/env/lib/python3.11/site-packages/modal/functions.py:170 in    │
│ run_function                                                                                     │
│                                                                                                  │
│    169 │   │   assert not item.result.gen_status                                                 │
│ ❱  170 │   │   return await _process_result(item.result, item.data_format, self.stub, self.clie  │
│    171                                                                                           │
│                                                                                                  │
│ /Users/lapuerta/dev/llm-finetuning/env/lib/python3.11/site-packages/modal/_utils/function_utils. │
│ py:375 in _process_result                                                                        │
│                                                                                                  │
│   374 │   │   │   except Exception as deser_exc:                                                 │
│ ❱ 375 │   │   │   │   raise ExecutionError(                                                      │
│   376 │   │   │   │   │   "Could not deserialize remote exception due to local error:\n"         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ExecutionError: Could not deserialize remote exception due to local error:
Deserialization failed because the 'huggingface_hub' module is not available in the local environment.
This can happen if your local environment does not have the remote exception definitions.
Here is the remote traceback:
Traceback (most recent call last):
  File "/root/src/train.py", line 90, in launch
    snapshot_download(model_name, local_files_only=True)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py",
line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File 
"/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 
235, in snapshot_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find an appropriate cached snapshot folder
for the specified revision on the local disk and outgoing traffic has been disabled. To enable repo 
look-ups and downloads online, pass 'local_files_only=False' as input.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", 
line 286, in hf_raise_for_status
    response.raise_for_status()
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/requests/models.py", line 1024, in 
raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: 
https://huggingface.co/api/models/mistralai/Mistral-7B-v0.1/revision/main

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File 
"/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 
179, in snapshot_download
    repo_info = api.repo_info(repo_id=repo_id, repo_type=repo_type, revision=revision, token=token)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py",
line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2275,
in repo_info
    return method(
           ^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py",
line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2085,
in model_info
    hf_raise_for_status(r)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", 
line 333, in hf_raise_for_status
    raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: 
https://huggingface.co/api/models/mistralai/Mistral-7B-v0.1/revision/main (Request ID: 
Root=1-6666ee5f-399cf03367d0ba5c0195c2f8;ade0d03f-5211-4dee-b4ae-fc1a9942aaf6)

Authorization error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/pkg/modal/_container_io_manager.py", line 492, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 378, in run_input_sync
    res = finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/src/train.py", line 94, in launch
    snapshot_download(model_name)
  File "/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py",
line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File 
"/root/miniconda3/envs/py3.11/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 
251, in snapshot_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the 
files on the Hub and we cannot find the appropriate snapshot folder for the specified revision on the 
local disk. Please check your internet connection and try again.
holma91 commented 5 months ago

My bad, the problem was the token permission on huggingface. Had to set "Read access to contents of all public gated repos you can access" to true.

mwaskom commented 5 months ago

Thanks for following up!