Closed wizzard0 closed 11 months ago
I haven't tried this model yet, let me try it and update
Updating pytorch to 2.1 changes the error to:
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
The model 'OptimizedModule' is not supported for . Supported models are ['BartForCausalLM', 'BertLMHeadModel', (...)
and generation works but is VERY slow (ie mpt-7B much slower than WizardLM-30B). GPU load is 0%, only CPU
I was able to start loading the model on MPS after tweaking the model config of the transformer. But I get this error since I don't have enough memory
RuntimeError: MPS backend out of memory (MPS allocated: 17.27 GB, other allocations: 819.94 MB, max allowed: 18.13 GB). Tried to allocate 192.00 MB on private pool. Use
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable the upper limit for memory allocations (may cause system failure).
I have a 16GB Mac M1 (13.4.1 (c)).
But I can see the GPU load when attempting to load the model.
@wizzard0 if you have larger memory than mine please try this
init_device
to 'mps' at config.json
of the transformer. (You can find it at ~/.cache/huggingface/hub/models--mosaicml--mpt-7b-chat/snapshots/c53dee01e05098f81cac11145f9bf45feedc5b2f/config.json
)if that solves your problem let me know
cc: @pseudotensor
Ohh THAT's where the init_device
should be entered! 🤣 I tried to fit it into many places (it suggests the init_device "meta" so I will try that too)
Um, nope. Model is loaded, but then when I click submit
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
thread exception: (<class 'RuntimeError'>, RuntimeError("User specified an unsupported autocast device_type 'mps'"), <traceback object at 0x28a42b440>)
make stop: (<class 'RuntimeError'>, RuntimeError("User specified an unsupported autocast device_type 'mps'"), <traceback object at 0x28a42b440>)
hit stop
Traceback (most recent call last):
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in process_api
result = await self.call_function(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/blocks.py", line 1093, in call_function
prediction = await utils.async_iteration(iterator)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/utils.py", line 341, in async_iteration
return await iterator.__anext__()
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/utils.py", line 334, in __anext__
return await anyio.to_thread.run_sync(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/Users/user/dev/h2o/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async
return next(iterator)
File "/Users/user/dev/h2o/h2ogpt/src/gradio_runner.py", line 1428, in bot
for res in get_response(fun1, history):
File "/Users/user/dev/h2o/h2ogpt/src/gradio_runner.py", line 1385, in get_response
for output_fun in fun1():
File "/Users/user/dev/h2o/h2ogpt/src/gen.py", line 2009, in evaluate
raise thread.exc
File "/Users/user/dev/h2o/h2ogpt/src/utils.py", line 340, in run
self._return = self._target(*self._args, **self._kwargs)
File "/Users/user/dev/h2o/h2ogpt/src/gen.py", line 2112, in generate_with_exceptions
func(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate
return self.greedy_search(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
outputs = self(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 270, in forward
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 202, in forward
(x, attn_weights, past_key_value) = block(x, past_key_value=past_key_value, attn_bias=attn_bias, attention_mask=attention_mask, is_causal=self.is_causal)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/blocks.py", line 35, in forward
a = self.norm_1(x)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/norm.py", line 24, in forward
with torch.autocast(enabled=False, device_type=module_device.type):
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 222, in __init__
raise RuntimeError(f'User specified an unsupported autocast device_type \'{self.device}\'')
RuntimeError: User specified an unsupported autocast device_type 'mps'
@wizzard0 what is the torch version you are using, and which h2ogpt branch you are using?
init_device=meta fails similarly though
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
/Users/user/dev/h2o/lib/python3.10/site-packages/transformers/generation/utils.py:1452: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on mps, whereas the model is on meta. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('meta') before running `.generate()`.
warnings.warn(
thread exception: (<class 'RuntimeError'>, RuntimeError('Tensor on device meta is not on the expected device mps:0!'), <traceback object at 0x2d35a2840>)
make stop: (<class 'RuntimeError'>, RuntimeError('Tensor on device meta is not on the expected device mps:0!'), <traceback object at 0x2d35a2840>)
hit stop
Traceback (most recent call last):
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in process_api
result = await self.call_function(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/blocks.py", line 1093, in call_function
prediction = await utils.async_iteration(iterator)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/utils.py", line 341, in async_iteration
return await iterator.__anext__()
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/utils.py", line 334, in __anext__
return await anyio.to_thread.run_sync(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/Users/user/dev/h2o/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async
return next(iterator)
File "/Users/user/dev/h2o/h2ogpt/src/gradio_runner.py", line 1428, in bot
for res in get_response(fun1, history):
File "/Users/user/dev/h2o/h2ogpt/src/gradio_runner.py", line 1385, in get_response
for output_fun in fun1():
File "/Users/user/dev/h2o/h2ogpt/src/gen.py", line 2009, in evaluate
raise thread.exc
File "/Users/user/dev/h2o/h2ogpt/src/utils.py", line 340, in run
self._return = self._target(*self._args, **self._kwargs)
File "/Users/user/dev/h2o/h2ogpt/src/gen.py", line 2112, in generate_with_exceptions
func(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate
return self.greedy_search(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
outputs = self(
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 270, in forward
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 192, in forward
(attn_bias, attention_mask) = self._attn_bias(device=x.device, dtype=torch.float32, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 115, in _attn_bias
attn_bias = attn_bias.masked_fill(~attention_mask.view(-1, 1, 1, s_k), min_val)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/_refs/__init__.py", line 5077, in masked_fill
r = torch.where(mask, value, a) # type: ignore[arg-type]
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 227, in _fn
result = fn(*args, **kwargs)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 130, in _fn
result = fn(**bound.arguments)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/_refs/__init__.py", line 1872, in where
utils.check_same_device(pred, a, b, allow_cpu_scalar_tensors=True)
File "/Users/user/dev/h2o/lib/python3.10/site-packages/torch/_prims_common/__init__.py", line 673, in check_same_device
raise RuntimeError(msg)
RuntimeError: Tensor on device meta is not on the expected device mps:0!
@wizzard0 please give me below
1.
python -c "import torch; print(torch.__version__)"
2.1.0.dev20230716
ff674651c84ab4083b352bfd1e21fd3999a74e51
cc: @wizzard0 if you are on a main (just updated) could you please create a fresh conda environment and try running the model with https://github.com/h2oai/h2ogpt/issues/463#issuecomment-1640550956
@Mathanraj-Sharma yes that's what I just did, one stacktrace is with init_device=mps, another with init_device meta. Both stacktraces are after submitting the prompt
@wizzard0 could you please also change torch_dtype
in config.json to float16
and see.
just a hunch https://github.com/pytorch/pytorch/issues/78168#issuecomment-1137686403
@wizzard0 could you please also change
torch_dtype
in config.json tofloat16
and see.just a hunch pytorch/pytorch#78168 (comment)
No change, still User specified an unsupported autocast device_type 'mps'
@wizzard0 as I mentioned earlier you can load the model to MPS by changing init_device
in config.json
of the transformer.
The error related to autocast
is from pytorch
. torch.autocast
does not support MPS yet
As mitigation in h2ogpt, we pass a NullContext
in such a situation. https://github.com/h2oai/h2ogpt/blob/70a377c763abd278e57eced1a7aff7f468676ed7/src/gen.py#L1942
MPT model is triggering the error
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/norm.py", line 24, in forward
with torch.autocast(enabled=False, device_type=module_device.type):
So this needs to be fixed from MPT's side
cc: @pseudotensor
@pseudotensor closing it since no active, and problem looks like need to be fixed from MPT's end
Trying to use MPT models with h2oai:
Observed behavior:
To create a public link, set
share=True
inlaunch()
. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:0 for open-end generation. /Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py:1452: UserWarning: You are calling .generate() with theinput_ids
being on a device type different than your model's device.input_ids
is on mps, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have putinput_ids
to the correct device by calling for example input_ids = input_ids.to('cpu') before running.generate()
. warnings.warn( thread exception: (<class 'RuntimeError'>, RuntimeError('Placeholder storage has not been allocated on MPS device!'), <traceback object at 0x8fe882200>) make stop: (<class 'RuntimeError'>, RuntimeError('Placeholder storage has not been allocated on MPS device!'), <traceback object at 0x8fe882200>) hit stop Traceback (most recent call last): File "/Users/user/dev/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict output = await app.get_blocks().process_api( File "/Users/user/dev/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in process_api result = await self.call_function( File "/Users/user/dev/lib/python3.10/site-packages/gradio/blocks.py", line 1093, in call_function prediction = await utils.async_iteration(iterator) File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 341, in async_iteration return await iterator.anext() File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 334, in anext return await anyio.to_thread.run_sync( File "/Users/user/dev/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/Users/user/dev/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/Users/user/dev/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, args) File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async return next(iterator) File "/Users/user/dev/h2ogpt/src/gradio_runner.py", line 1428, in bot for res in get_response(fun1, history): File "/Users/user/dev/h2ogpt/src/gradio_runner.py", line 1385, in get_response for output_fun in fun1(): File "/Users/user/dev/h2ogpt/src/gen.py", line 2011, in evaluate raise thread.exc File "/Users/user/dev/h2ogpt/src/utils.py", line 340, in run self._return = self._target(self._args, self._kwargs) File "/Users/user/dev/h2ogpt/src/gen.py", line 2114, in generate_with_exceptions func(*args, *kwargs) File "/Users/user/dev/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate return self.greedy_search( File "/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search outputs = self( File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 270, in forward outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache) File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 168, in forward tok_emb = self.wte(input_ids) File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/custom_embedding.py", line 11, in forward return super().forward(input) File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Placeholder storage has not been allocated on MPS device!