Closed elricwan closed 1 year ago
I have the same Pytorch version (1.13.1++cu117) running with Python 3.8.12 on Ubuntu 20.04. You could check the model sizes on disk (consolidated.00.pth for 7B should be about 13.5 GB) to make sure your checkpoints are not corrupted. I doubt they updated their weights recently, otherwise, the changes would be reflected in the llama repository. Which GPU do you have?
I use one NVIDIA GeForce RTX 3090, the memory is 24 G.
My guess is that either your checkpoint files are corrupted, or you have a pytorch that is incompatible with your card. Check that you can run other scripts with your installed pytorch in the same env.
I use one NVIDIA GeForce RTX 3090, the memory is 24 G.
I'm assuming you are still running two instances of the 7B model. If you run 1 (same as the original llama implementation), you'd need to set the number of wrapyfi devices to 0. If you are running multiple instances, does this occur on the device_idx 0 or 1?
I use the code to run on my first GPU:
CUDA_VISIBLE_DEVICES="0" OMP_NUM_THREADS=1 torchrun --nproc_per_node 1 example.py --ckpt_dir
did you change
also, notice that it is in a directory called checkpoints as indicated by
Hi there,
I have downloaded the LLaMA models, but when I try to load the model, I got the error: RuntimeError: PytorchStreamReader failed reading file data/2: invalid header or archive is corrupted
My PyTorch version is 1.13.1. Has the model version updated? My download files look like this: