ml5js / ml5-next-gen

Repo for next generation of ml5.js: friendly machine learning for the web! 🤖
https://ml5js.org/
Other
59 stars 20 forks source link

setBackend() required for ml5.neuralNetwork() bug #117

Open shiffman opened 5 months ago

shiffman commented 5 months ago

The ml5.neuralNetwork() examples require either:

ml5.setBackend("webgl");
ml5.setBackend("cpu");

Without one of the above, the examples break with an error related to webgpu, see #34. This came up again while reviewing #105. We'll leave it for now, but it would be great to remove the requirement from the examples and have them all work with webgl as default.

lindapaiste commented 5 months ago

I think that all we need is an await tf.ready() statement in neuralNetwork init method. https://github.com/ml5js/ml5-next-gen/pull/105#discussion_r1536473967

The ml5.setBackend function is an oddball because it initializes an asynchronous change but doesn't wait for it (tf.setBackend is async). So we need to wait for TF to be fully set up before using it.

ziyuan-linn commented 3 months ago

Adding await tf.ready() seems to work! However, a new error occurred now that we are using webgpu backend:

Error: WebGPU readSync is only available for CPU-resident tensors.
    at ms.readSync (pose-detection.esm.js:17:87509)
    at Engine.readSync (engine.js:943:1)
    at Tensor.dataSync (tensor.js:297:1)
    at Tensor.arraySync (tensor.js:228:29)
    at NeuralNetworkData.js:509:1
    at engine.js:328:1
    at Engine.scopedRun (engine.js:338:1)
    at Engine.tidy (engine.js:327:1)
    at Module.tidy (globals.js:175:18)
    at NeuralNetworkData.createOneHotEncodings (NeuralNetworkData.js:494:12)

I will try to investigate this further.

lindapaiste commented 3 months ago

Over the long term I think that the ideal thing would be to avoid synchronous operations and use .array() instead of .arraySync(). However this function is called way deep in a chain so there's a lot of methods that would need to be converted to async for that to work.

It seems like the error has been fixed by TFJS, not sure what version it was released in. It might be that we need to bump our dependencies. Fix PR: https://github.com/tensorflow/tfjs/pull/7576 Issue: https://github.com/tensorflow/tfjs/issues/5468

ziyuan-linn commented 3 months ago

Thank you @lindapaiste! I will try to bump the dependencies and see what happens. I agree that switching to async would be ideal over the long term.