Open ayttop opened 2 months ago
model = AutoModel.from_pretrained("garage-bAInd/Platypus2-13B", delete_original = True)
True or false?
from airllm import AutoModel
MAX_LENGTH = 128
model = AutoModel.from_pretrained("garage-bAInd/Platypus2-13B", delete_original = True)
input_text = [ 'Who is Napoleon Bonaparte?',
]
input_tokens = model.tokenizer(input_text, return_tensors="pt", return_attention_mask=False, truncation=True, max_length=MAX_LENGTH, padding=False)
generation_output = model.generate( input_tokens['input_ids'].cuda(), max_new_tokens=20, use_cache=True, return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
cache_utils installed /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: The secret
HF_TOKEN
does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks. Please note that authentication is recommended but still optional to access public models or datasets. warnings.warn( Fetching 11 files: 100% 11/11 [00:00<00:00, 670.43it/s] found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': False, 'model.layers.19.': False, 'model.layers.20.': False, 'model.layers.21.': False, 'model.layers.22.': False, 'model.layers.23.': False, 'model.layers.24.': False, 'model.layers.25.': False, 'model.layers.26.': False, 'model.layers.27.': False, 'model.layers.28.': False, 'model.layers.29.': False, 'model.layers.30.': False, 'model.layers.31.': False, 'model.layers.32.': False, 'model.layers.33.': False, 'model.layers.34.': False, 'model.layers.35.': False, 'model.layers.36.': False, 'model.layers.37.': False, 'model.layers.38.': False, 'model.layers.39.': False, 'model.norm.': False, 'lm_head.': False} some layer splits found, some are not, re-save all layers in case there's some corruptions. 0%| | 0/43 [00:00<?, ?it/s]Loading shard 1/3 /usr/local/lib/python3.10/dist-packages/airllm/utils.py:296: FutureWarning: You are usingtorch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict.update(torch.load(to_load, map_location='cpu')) 37%|███▋ | 16/43 [00:50<00:09, 2.76it/s]deleting original file: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/pytorch_model-00001-of-00003.bin Loading shard 2/3 44%|████▍ | 19/43 [01:35<02:43, 6.82s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.18.safetensors 47%|████▋ | 20/43 [02:10<05:55, 15.48s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.19.safetensors 49%|████▉ | 21/43 [02:41<07:17, 19.90s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.20.safetensors 51%|█████ | 22/43 [03:10<08:00, 22.88s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.21.safetensors 53%|█████▎ | 23/43 [03:22<06:29, 19.48s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.22.safetensors 56%|█████▌ | 24/43 [03:41<06:08, 19.38s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.23.safetensors 58%|█████▊ | 25/43 [04:01<05:53, 19.64s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.24.safetensors 60%|██████ | 26/43 [04:11<04:45, 16.77s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.25.safetensors 63%|██████▎ | 27/43 [04:28<04:28, 16.78s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.26.safetensors 65%|██████▌ | 28/43 [04:43<04:02, 16.19s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.27.safetensors 67%|██████▋ | 29/43 [04:52<03:16, 14.07s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.28.safetensors 70%|██████▉ | 30/43 [04:57<02:28, 11.42s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.29.safetensors 72%|███████▏ | 31/43 [05:08<02:15, 11.31s/it]deleting original file: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/pytorch_model-00002-of-00003.bin Loading shard 3/3 Fetching 1 files: 100% 1/1 [01:07<00:00, 67.51s/it] pytorch_model-00003-of-00003.bin: 100% 6.18G/6.18G [01:07<00:00, 151MB/s] saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.30.safetensors 74%|███████▍ | 32/43 [06:51<07:04, 38.62s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.31.safetensors 77%|███████▋ | 33/43 [07:09<05:24, 32.45s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.32.safetensors 79%|███████▉ | 34/43 [07:13<03:34, 23.84s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.33.safetensors 81%|████████▏ | 35/43 [07:28<02:49, 21.18s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.34.safetensors 84%|████████▎ | 36/43 [07:34<01:58, 16.90s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.35.safetensors 86%|████████▌ | 37/43 [07:41<01:22, 13.72s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.36.safetensors 88%|████████▊ | 38/43 [07:52<01:05, 13.04s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.37.safetensors 93%|█████████▎| 40/43 [08:12<00:32, 10.97s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.38.safetensors saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.layers.39.safetensors 95%|█████████▌| 41/43 [08:22<00:21, 10.72s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/model.norm.safetensors 98%|█████████▊| 42/43 [08:22<00:07, 7.58s/it]saved as: /root/.cache/huggingface/hub/models--garage-bAInd--Platypus2-13B/snapshots/dc1024c1b9df38f57f6436a02d31706cb0deaa01/splitted_model/lm_head.safetensors 100%|██████████| 43/43 [08:24<00:00, 11.73s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [02:00<00:00, 2.79s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:56<00:00, 2.71s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [02:01<00:00, 2.82s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [02:02<00:00, 2.84s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:59<00:00, 2.78s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:59<00:00, 2.77s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.77s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.76s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:57<00:00, 2.74s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:57<00:00, 2.74s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.76s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.75s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.76s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:59<00:00, 2.77s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.75s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:59<00:00, 2.77s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.75s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [02:00<00:00, 2.80s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [01:58<00:00, 2.76s/it] new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> running layers(cuda:0): 100%|██████████| 43/43 [02:05<00:00, 2.93s/it]Who is Napoleon Bonaparte? Napoleon Bonaparte was a French military and political leader who rose to promin
He succeeded in Colab t4 garage-bAInd/Platypus2-13B
model = AutoModel.from_pretrained("garage-bAInd/Platypus2-13B", delete_original = True)
Where do I put it? delete_original = True