huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
148 stars 187 forks source link

Major indexing bug in eager mode #961

Closed mattolson93 closed 2 months ago

mattolson93 commented 5 months ago

System Info

optimum-habana v1.11.0.dev0
deepspeed v1.11.0

vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1:latest
| HL-SMI Version:                              hl-1.14.0-fw-48.0.1.0          |
| Driver Version:                                     1.14.0-9e8ecf8          |

gaudi2

Information

Tasks

Reproduction

import os
os.environ['PT_HPU_LAZY_MODE'] = '0'
import torch
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
adapt_transformers_to_gaudi()

x = torch.arange(10).to("hpu")
for i in range(10): print(x[i])

produces the following output:

tensor(0, device='hpu:0')
tensor(2, device='hpu:0')
tensor(4, device='hpu:0')
tensor(6, device='hpu:0')
tensor(8, device='hpu:0')
tensor(0, device='hpu:0')
tensor(0, device='hpu:0')
tensor(0, device='hpu:0')
tensor(0, device='hpu:0')
tensor(0, device='hpu:0')

Expected behavior

The script should output:

tensor(0, device='hpu:0')
tensor(1, device='hpu:0')
tensor(2, device='hpu:0')
tensor(3, device='hpu:0')
tensor(4, device='hpu:0')
tensor(5, device='hpu:0')
tensor(6, device='hpu:0')
tensor(7, device='hpu:0')
tensor(8, device='hpu:0')
tensor(9, device='hpu:0')

or at least throw an error

ssarkar2 commented 4 months ago

Able to repro similar issue on 1.15. if the tensor x is printed, it prints the right thing, but printing with indexing gives wrong result as mentioned in the description. Also Lazy mode produces right result.

issue will be fixed in 1.16, when that releases.

regisss commented 3 months ago

@mattolson93 It should work now with SynapseAI 1.16 and Optimum Habana 1.12. Does it also work on your side?

mattolson93 commented 3 months ago

Thank you! I will check once I get access to a machine with the right drivers.

mattolson93 commented 2 months ago

I have tested the new version and it works. Thank you!