Open skathpalia opened 1 year ago
me too.. :(
You will need to run it on CPU: https://github.com/krychu/llama. Let me know if that work with 16GB memory, might be a bit tight.
Thank you @krychu !
Now I get
RuntimeError: MPS backend out of memory (MPS allocated: 3.34 GB, other allocations: 9.99 MB, max allowed: 3.40 GB). Tried to allocate 86.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
I inserted PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
in generation.py file. But, I still get the same error..
Thank you for your help in advance.
Here are the results after fixing the above error
torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 4 NOTE: Redirects are currently not supported in Windows or MacOs.
initializing model parallel with size 1 initializing ddp with size 1 initializing pipeline with size 1 Traceback (most recent call last): File "/Volumes/Users/gitRepositories/llama/example_text_completion.py", line 56, in
fire.Fire(main) File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/Volumes/Users/gitRepositories/llama/example_text_completion.py", line 18, in main generator = Llama.build( ^^^^^^^^^^^^ File "/Volumes/Users/gitRepositories/llama/llama/generation.py", line 92, in build assert len(checkpoints) > 0, f"no checkpoint files found in {ckpt_dir}" ^^^^^^^^^^^^^^^^^^^^ AssertionError: no checkpoint files found in --ckpt_dir ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 63097) of binary: /Users/shivkathpalia/venv/bin/python Traceback (most recent call last): File "/Users/shivkathpalia/venv/bin/torchrun", line 8, in sys.exit(main()) ^^^^^^ File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f( args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/shivkathpalia/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:example_text_completion.py FAILED
Failures:
------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-07-22_10:06:34 host : 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa rank : 0 (local_rank: 0) exitcode : 1 (pid: 63097) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
I have solved it with a cpu installation by installing this : https://github.com/krychu/llama
instead of https://github.com/facebookresearch/llama
Complete process to install :
https://github.com/facebookresearch/llama
and extract it to a llama-main
folderhttps://github.com/krychu/llama
and extract it and replace files in the llama-main
folderdownload.sh
script in a terminal, passing the URL provided when prompted to start the downloadllama-main
folderpython3 -m venv env
and activate it : source env/bin/activate
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
python3 -m pip install -e .
torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 128 --max_batch_size 1 #(instead of 4)
Unable to run the following command torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model
--max_seq_len 128 --max_batch_size 4
I am running it on MacBook Pro with following configuration.