huggingface / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
11.36k stars 708 forks source link

v3 webgpu crash after upgrade to chromium 129 #943

Closed eyaler closed 3 weeks ago

eyaler commented 3 weeks ago

System Info

windows 64 tested with chrome 128,129,130beta and edge 128,129,130canary transformers v3alpha3-15 (also compared to 2.17.2)

Environment/Platform

Description

hi! i am running https://huggingface.co/Xenova/modnet with onnx-runtime web on webgpu (fp32 non-quantized model), with the following observations:

  1. chromium 128-130 + transformers v2.17.2 = works with slow performance
  2. chromium 128 + transformers v3alpha3-15 = works at ~ 10x(!) performance compared to above
  3. chromium 129-130 + transformers v3alpha3-15 = crash with ORT error: Failed to execute 'mapAsync' on 'GPUBuffer': A valid external Instance reference no longer exists.

Reproduction

You can see this in my following code. If necessary I can prepare a minimal example.

Compare the following with both transformers.js versions on both chromium 128 vs. 129/130 (approve the self-tab share and choose modnet from the dropdown)

  1. (v2.17.2) https://eyaler.github.io/LordTubeMaster/#dQw4w9WgXcQ
  2. (v3alpha15) https://oulipoh.github.io/LordTubeMasterDev/#dQw4w9WgXcQ
gyagp commented 3 weeks ago

We also hit this issue last Friday and the investigation is WIP. It looks like a regression in Chrome now. Stay tuned.

gyagp commented 3 weeks ago

After more investigation, we think it's a DXC (shader compiler) issue. You may find more details at https://issues.chromium.org/issues/368997517. A workaround in ONNX Runtime, which Transformers.js is based on, is under review at https://github.com/microsoft/onnxruntime/pull/21995.

flatsiedatsie commented 3 weeks ago

Could "range error - buffer allocation failed" be related to this? I'm seeing that error in Android Chrome after an update there.

Screenshot 2024-09-25 at 12 48 39
gyagp commented 3 weeks ago

Could "range error - buffer allocation failed" be related to this? I'm seeing that error in Android Chrome after an update there.

Screenshot 2024-09-25 at 12 48 39

Your issue looks like OOM, and it's not related to this one, which crashes the GPU process. BTW, Google already fixed this issue in DXC, and the roll to Chrome will take place in a few days. We will ask if they could back port it to Chrome Stable.

eyaler commented 3 weeks ago

alpha17 fixes this for me. thanks!

flatsiedatsie commented 3 weeks ago

It seems fixed here too. Nice!