Closed pmusser closed 3 months ago
Just a note, I also tried with EleutherAI/gpt-j-6b and the issue happened again.
Don't know if it's helpful, but I had the performance tab in task manager open with the GPU selected, and recorded a video of what it was doing when it crashed. This is the last frame before BSOD, looks like it crashes right when GPU Dedicated memory usage hits 100%
Seems like it might be an issue with dedicated gpu memory based on that screenshot. We can try to reproduce on our end. Were you able to run it with the Llama-2-7b-hf
model from the blog? And if so, how was your memory usage during that run?
@kta-intel not yet, I meant to request access but hadn't done so yet. Will try ASAP (huggingface site is down presently).
Incidentally I also discovered that if I don't use IPEX but use the Arc A770 when using the zero-shot-classification pipeline with facebook/bart-large-mnli
, MoritzLaurer/mDeBERTa-v3-base-mnli-xnli
, or MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli
the same thing happens -- but only when I shut down the python kernel it was running in. It otherwise runs without issue!
@kta-intel Success, after a fashion -- I got llama-2-7b-hf installed and tried a few times to get it to work. The first several times didn't; the error I got stated that protobuf was required but not installed. I had some trouble getting int installed in the environment I'd set up following the blog instructions and recognized by jupyter -- ultimately had to activate environment, load jupyter notebook, open a console from within notebook and pip install protobuf there, then restart kernel but now it works, except the following error in output after checkpoint shards and before printing output:
Intel(R) Arc(TM) A770 Graphics
Loading checkpoint shards: 100%
2/2 [00:00<00:00, 7.16it/s]
~~Keyword arguments {'add_special_tokens': False} not recognized.~~
You may have heard of Schrodinger cat mentioned in a thought experiment in quantum physics. Briefly, according to the Copenhagen interpretation of quantum mechanics, the cat in a sealed box is simultaneously alive and dead until we open the box and observe the cat. The macrostate of cat (either alive or dead) is determined at the moment we observe the cat. This is called the Copenhagen interpretation of quantum mechanics.
The quantum world is so strange that it is difficult to understand. This is because the quantum world is so different from our normal everyday world. For example, the quantum world is so strange that it is difficult to understand. In the quantum world, particles can be in different places at the same time. This is called superposition. This is a very strange thing, because in our everyday world, we can only be in one place at a time.
Another strange thing about the quantum world is that particles can be in different states at the same time.
Haven't closed kernel yet to see if BSOD, will let you know momentarily EDIT: closing kernel doesn't go to BSOD!
Looks like missing protobuf
may have been the root; once I installed it I was able to successfully (well, sort of -- see below) run the original code without going to BSOD.
Now it looks like I just have to grapple with a lack of sufficient memory for google/gemma-7b
:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[1], line 37
35 input_ids = input_ids.to("xpu")
36 ##########################################
---> 37 generated_ids = model.generate(input_ids, max_new_tokens=128)[0]
38 generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
40 print(generated_text)
File [~\.conda\envs\llm\lib\site-packages\torch\utils\_contextlib.py:115](http://localhost:8888/lab/tree/_Jupyter%20books/Pressbooks%20Conforming/Blank%20subjects/~/.conda/envs/llm/lib/site-packages/torch/utils/_contextlib.py#line=114), in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File [~\.conda\envs\llm\lib\site-packages\transformers\generation\utils.py:1392](http://localhost:8888/lab/tree/_Jupyter%20books/Pressbooks%20Conforming/Blank%20subjects/~/.conda/envs/llm/lib/site-packages/transformers/generation/utils.py#line=1391), in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
1389 requires_attention_mask = "encoder_outputs" not in model_kwargs
1391 if model_kwargs.get("attention_mask", None) is None and requires_attention_mask and accepts_attention_mask:
-> 1392 model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
1393 inputs_tensor, generation_config.pad_token_id, generation_config.eos_token_id
1394 )
1396 # decoder-only models should use left-padding for generation
1397 if not self.config.is_encoder_decoder:
1398 # If `input_ids` was given, check if the last id in any sequence is `pad_token_id`
1399 # Note: If using, `inputs_embeds` this check does not work, because we want to be more hands-off.
File [~\.conda\envs\llm\lib\site-packages\transformers\generation\utils.py:476](http://localhost:8888/lab/tree/_Jupyter%20books/Pressbooks%20Conforming/Blank%20subjects/~/.conda/envs/llm/lib/site-packages/transformers/generation/utils.py#line=475), in GenerationMixin._prepare_attention_mask_for_generation(self, inputs, pad_token_id, eos_token_id)
469 def _prepare_attention_mask_for_generation(
470 self,
471 inputs: torch.Tensor,
472 pad_token_id: Optional[int],
473 eos_token_id: Optional[Union[int, List[int]]],
474 ) -> torch.LongTensor:
475 is_input_ids = len(inputs.shape) == 2 and inputs.dtype in [torch.int, torch.long]
--> 476 is_pad_token_in_inputs = (pad_token_id is not None) and (pad_token_id in inputs)
477 if isinstance(eos_token_id, int):
478 eos_token_id = [eos_token_id]
File [~\.conda\envs\llm\lib\site-packages\torch\_tensor.py:1059](http://localhost:8888/lab/tree/_Jupyter%20books/Pressbooks%20Conforming/Blank%20subjects/~/.conda/envs/llm/lib/site-packages/torch/_tensor.py#line=1058), in Tensor.__contains__(self, element)
1054 return handle_torch_function(Tensor.__contains__, (self,), self, element)
1055 if isinstance(
1056 element, (torch.Tensor, Number, torch.SymInt, torch.SymFloat, torch.SymBool)
1057 ):
1058 # type hint doesn't understand the __contains__ result array
-> 1059 return (element == self).any().item() # type: ignore[union-attr]
1061 raise RuntimeError(
1062 f"Tensor.__contains__ only supports Tensor or scalar, but you passed in a {type(element)}."
1063 )
RuntimeError: Allocation is out of device memory on current platform.
Hey, sorry for the delay. Glad that the original issue was resolved. Regarding OOM, have you tried quantizing the model and seeing if it's able to run? ex. https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm/llm_optimize.html#weight-only-quantization-woq
close for long time no response. Feel free to reopen it if needed.
Describe the bug
After following the steps included in the blog post that came out a few days ago, I modified the code to try and interact with the google-gemma-7b model in a Jupyter notebook. Code as follows:
The code started executing successfully, but after the the model information got transferred to the GPU my computer started having some artifacts (chrome windows blanking and resizing), and shortly after the computer went to a BSOD of VIDEO_SCHEDULER_INTERNAL_ERROR, as follows:
The computer has rebooted from a bugcheck. The bugcheck was: 0x00000119 (0x0000000000000005, 0xffffe30e54c27000, 0xffffe30e5468a030, 0x0000000000050ec1). A dump was saved in: C:\WINDOWS\MEMORY.DMP. Report Id: 96e4c860-9dc4-49b8-a14f-e02f85d20f5e.
Versions
PyTorch version: 2.1.0a0+cxx11.abi PyTorch CXX11 ABI: No IPEX version: 2.1.10+xpu IPEX commit: a12f9f650 Build type: Release
OS: Microsoft Windows 11 Pro GCC version: N/A Clang version: N/A IGC version: 2024.0.2 (2024.0.2.20231213) CMake version: version 3.28.0-msvc1 Libc version: N/A
Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is XPU available: True DPCPP runtime version: N/A MKL version: N/A GPU models and configuration: [0] _DeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=15930MB, max_compute_units=512, gpu_eu_count=512) Intel OpenCL ICD version: N/A Level Zero version: N/A
CPU: Architecture=9 CurrentClockSpeed=3600 DeviceID=CPU0 Family=107 L2CacheSize=4096 L2CacheSpeed= Manufacturer=AuthenticAMD MaxClockSpeed=3600 Name=AMD Ryzen 7 3700X 8-Core Processor ProcessorType=3 Revision=28928
Versions of relevant libraries: [pip3] intel-extension-for-pytorch==2.1.10+xpu [pip3] numpy==1.26.4 [pip3] torch==2.1.0a0+cxx11.abi [conda] intel-extension-for-pytorch 2.1.10+xpu pypi_0 pypi [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.1.0a0+cxx11.abi pypi_0 pypi