Closed rasbt closed 1 year ago
do git submodule update --init --recursive
to get the lit-gpt
content~
COPY /lit-gpt/ /submission/
copied lit-gpt
content to submission, and the WORKDIR /submission
set the working directory, so it should be able to access scripts
Thanks, but it's still the same issue:
$ ~ git clone https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge.gt
Cloning into 'neurips_llm_efficiency_challenge'...
remote: Enumerating objects: 330, done.
remote: Counting objects: 100% (98/98), done.
remote: Compressing objects: 100% (68/68), done.
remote: Total 330 (delta 43), reused 43 (delta 30), pack-reused 232
Receiving objects: 100% (330/330), 193.58 KiB | 21.51 MiB/s, done.
Resolving deltas: 100% (120/120), done.
$ ~ ls
main.py neurips_llm_efficiency_challenge
$ ~ cd neurips_llm_efficiency_challenge
$ master ~/neurips_llm_efficiency_challenge ls
README.md helm.md leaderboard.md open_api_spec.json run_specs.conf sample-submissions
$ master ~/neurips_llm_efficiency_challenge git submodule update --init --recursive
Submodule 'sample-submissions/lit-gpt/lit-gpt' (https://github.com/Lightning-AI/lit-gpt) registered for path 'sample-submissions/lit-gpt/lit-gpt'
Cloning into '/teamspace/studios/this_studio/neurips_llm_efficiency_challenge/sample-submissions/lit-gpt/lit-gpt'...
Submodule path 'sample-submissions/lit-gpt/lit-gpt': checked out '1985cd8166801e9af639ba5a67fddaf4d8f3523e'
$ master ~/neurips_llm_efficiency_challenge ls
README.md helm.md leaderboard.md open_api_spec.json run_specs.conf sample-submissions
$ master ~/neurips_llm_efficiency_challenge cd sample-submissions
$ master ~/neurips_llm_efficiency_challenge/sample-submissions ls
lit-gpt llama_recipes
$ master ~/neurips_llm_efficiency_challenge/sample-submissions ccd lit-gpt
$ master ~/neurips_llm_efficiency_challenge/sample-submissions/lit-gpt ls
Dockerfile README.md api.py fast_api_requirements.txt helper.py lit-gpt main.py
$ master ~/neurips_llm_efficiency_challenge/sample-submissions/lit-gpt docker build -t toy_sbmission .
[+] Building 131.2s (12/16) docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.52kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for ghcr.io/pytorch/pytorch-nightly:c69b6e5-cu11.8.0 0.6s
=> [ 1/12] FROM ghcr.io/pytorch/pytorch-nightly:c69b6e5-cu11.8.0@sha256:748628fda7661 89.8s
=> => resolve ghcr.io/pytorch/pytorch-nightly:c69b6e5-cu11.8.0@sha256:748628fda7661f7e 0.0s
=> => sha256:748628fda7661f7e0612299b2012ca3a9407ac920ea791398f9d553de 1.37kB / 1.37kB 0.0s
=> => sha256:0dae61445ae902e60ece5407ab4e4b1fc25567c1e32d8d5b32d6e1552 4.93kB / 4.93kB 0.0s
=> => sha256:01085d60b3a624c06a7132ff0749efc6e6565d9f2531d7685ff559f 27.51MB / 27.51MB 0.2s
=> => sha256:b562d7b85a6579a81124c487f14e9cb7660dfe9ddb963a85373987e 10.01MB / 10.01MB 0.3s
=> => sha256:0d4d46bbff85996e4aad70cc1bb7ca40394a39e15593eae54f8f2963 3.85GB / 3.85GB 29.5s
=> => sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc 32B / 32B 0.4s
=> => sha256:2528d6dedc341a579aaa30dc5ad0efc1e645a7c407b608666e5191f9dffff61 99B / 99B 0.4s
=> => extracting sha256:01085d60b3a624c06a7132ff0749efc6e6565d9f2531d7685ff559fb5d0f66 0.9s
=> => extracting sha256:b562d7b85a6579a81124c487f14e9cb7660dfe9ddb963a85373987e0f98d99 0.6s
=> => extracting sha256:0d4d46bbff85996e4aad70cc1bb7ca40394a39e15593eae54f8f2963b5ca1 59.3s
=> => extracting sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8ac 0.0s
=> => extracting sha256:2528d6dedc341a579aaa30dc5ad0efc1e645a7c407b608666e5191f9dffff6 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 627.22kB 0.0s
=> [ 2/12] WORKDIR /submission 2.8s
=> [ 3/12] COPY /lit-gpt/ /submission/ 0.0s
=> [ 4/12] COPY ./fast_api_requirements.txt fast_api_requirements.txt 0.0s
=> [ 5/12] RUN pip install --no-cache-dir --upgrade -r fast_api_requirements.txt 2.2s
=> [ 6/12] RUN apt-get update && apt-get install -y git 9.2s
=> [ 7/12] RUN pip install -r requirements.txt huggingface_hub sentencepiece 24.1s
=> ERROR [ 8/12] RUN python scripts/download.py --repo_id openlm-research/open_llama_3 2.4s
------
> [ 8/12] RUN python scripts/download.py --repo_id openlm-research/open_llama_3b:
Fetching 4 files: 0%| | 0/4 [00:00<?, ?it/s]
2.035 Traceback (most recent call last):
2.035 File "/submission/scripts/download.py", line 74, in <module>
2.035 CLI(download_from_hub)
2.035 File "/opt/conda/lib/python3.10/site-packages/jsonargparse/_cli.py", line 96, in CLI
2.035 return _run_component(components, cfg_init)
2.035 File "/opt/conda/lib/python3.10/site-packages/jsonargparse/_cli.py", line 181, in _run_component
2.035 return component(**cfg)
2.035 File "/submission/scripts/download.py", line 45, in download_from_hub
2.035 snapshot_download(
2.035 File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
2.035 return fn(*args, **kwargs)
2.035 File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/_snapshot_download.py", line 239, in snapshot_download
2.036 thread_map(
2.036 File "/opt/conda/lib/python3.10/site-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
2.036 return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
2.036 File "/opt/conda/lib/python3.10/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
2.036 return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
2.036 File "/opt/conda/lib/python3.10/site-packages/tqdm/std.py", line 1178, in __iter__
2.036 for obj in iterable:
2.036 File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
2.036 yield _result_or_cancel(fs.pop())
2.036 File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
2.036 return fut.result(timeout)
2.036 File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 458, in result
2.036 return self.__get_result()
2.036 File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2.036 raise self._exception
2.036 File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2.036 result = self.fn(*self.args, **self.kwargs)
2.036 File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/_snapshot_download.py", line 214, in _inner_hf_hub_download
2.037 return hf_hub_download(
2.037 File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
2.037 return fn(*args, **kwargs)
2.037 File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1427, in hf_hub_download
2.037 _check_disk_space(expected_size, local_dir)
2.037 File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 975, in _check_disk_space
2.037 target_dir_free = shutil.disk_usage(target_dir).free
2.037 File "/opt/conda/lib/python3.10/shutil.py", line 1331, in disk_usage
2.037 st = os.statvfs(path)
2.037 FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/openlm-research/open_llama_3b'
------
Dockerfile:21
--------------------
19 |
20 | # get open-llama weights: https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/download_openllama.md
21 | >>> RUN python scripts/download.py --repo_id openlm-research/open_llama_3b
22 | RUN python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/openlm-research/open_llama_3b
23 |
--------------------
ERROR: failed to solve: process "/bin/sh -c python scripts/download.py --repo_id openlm-research/open_llama_3b" did not complete successfully: exit code: 1
$ master ~/neurips_llm_efficiency_challenge/sample-submissions/lit-gpt
I got the same issue.
It seems to work with huggingface==0.16.4
as mentioned here.
cc @carmocca
it is due to no local directory of checkpoints/openlm-research/open_llama_3b
. I added
RUN mkdir checkpoints
RUN mkdir checkpoints/openlm-research
RUN mkdir checkpoints/openlm-research/open_llama_3b
in the Docker File and it works
Thx for the fix @carmocca @msaroufim , it works now! đź‘Ť
Hey, I am trying to run the tutorial at https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/tree/master/sample-submissions/lit-gpt
It doesn't say so, but if I want to make a Lit-GPT submission, I have to
cd
into/neurips_llm_efficiency_challenge/sample-submissions/lit-gpt
because that's where the Dockerfile is?Now, when running the
docker build
step, I am getting the following error:Trying the last step manually, it gives me an error that the
scripts/download.py
file doesn't exist. Shouldn't it belit-gpt/scripts/download.py
instead?