immersive-web / webxr

Repository for the WebXR Device API Specification.
https://immersive-web.github.io/webxr/
Other
2.98k stars 381 forks source link

Neural Network API #338

Closed temsa closed 6 years ago

temsa commented 6 years ago

While not 100% linked to WebXR, this reflects my experience as an AR developer.

Trying to make a WebXR/AR device interact with reality means more than just being to display something in the screen "anchored" to the reality, it also means being able to recognize at least parts of the environment ( and not just its planes or location ). For my latest AR project I needed to recognize a particular set of real objects in the "screen" and augment it with virtual stuff around it. For that, I needed something flexible that would recognize its position on the screen pretty easily with good performance. I used TinyYOLO neural network, which provided me with bounding boxes around my object in the camera field of view (which is displayed in my WebAR experience). I just retrained it with a few hundreds photo, and it works really well. Then, I needed to integrate it with my WebXR code. I ended up forking the iOS WebXR browser made by Mozilla to take the image from the camera, inject it into CoreML and inject the resulting bounding boxes into the WKWebView. I would have liked to not have to fork the browser : deeplearn-js was not really there yet for running TinyYolo with bounding box, plus it does not use the dedicated hardware for this. I really see myself using TinyYOLO or other neural networks in the next few month increasingly, as now using them is more and more everyday simple engineering

There are a number of dedicated hardware ( hopefully all AR device will have some), and a dedicated specification for handling neural networks ( a WebTensor API ? ) would be really helpful for making great stuff, with great fps, even on mobile, especially if its using a dedicated hardware for them.

There is probably a lot of work to achieve this generic, safe efficient and powerful specification we'd need as AR developers for having great Web AR experiences, but if nobody ask for it, we will never have it, and I'm already missing it today !

blairmacintyre commented 6 years ago

Hi @temsa

The questions of "world understanding", "thing recognition", "thing tracking", etc are hugely important.

The big question for WebXR is: what should be in the standard, or related standards? What can we reasonably define that can be implemented on all platforms?

I've got a bunch of ideas and opinions on this, as I'm sure others do. I started an issue over in the proposals repo (https://github.com/immersive-web/proposals/issues/4) related to custom CV, and we've been experimenting with what such an API might look like; of course, doing CV in the page doesn't let you take advantage of the custom APIs on the platform, but it will be "one part of it".

The process we've adopted here is to keep the "current" WebXR Device API discussion here, have discussions about proposed new features in the proposals repo, and then move proposals into their own repo (as has been done for anchors and hit-test) when there are multiple groups of people wanting to tackle it.

I'm very anxious to make progress on this topic, but it's very large. We should move this to proposals, and try and sort out how to break it down into pieces that we can make progress on!

huningxin commented 6 years ago

There is a Web Machine Learning API discussion thread on WICG: https://discourse.wicg.io/t/api-set-for-machine-learning-on-the-web/2491. It aims to bring hardware accelerated machine learning capability to web platform. Augmented Reality or Mixed Reality is the target use case.

So far, with the WebML API POC, web app can get near-to-native MobileNet image classification performance, see details in post #10. The POC currently supports Android and MacOS. It might be quite interesting to expose WebML API in iOS/Android WebXR browser and use MobileNet+SSD within a WebXR app.

@temsa , what do you think? Does MobileNet+SSD meet your use case?

temsa commented 6 years ago

@huningxin WebML API POC is a really great news, including its performances :rocket:

Regarding Mobilenet-SSD, it may well be enough: I'd have to train it to be sure of course, but in general videos comparing TinyYolo and imagenet-SSD looks pretty much the same, if not in favor of imagenet-SSD

huningxin commented 6 years ago

FYI @temsa , we put together a SSD MobileNet example based on WebML/NN API POC: https://huningxin.github.io/webml-examples/examples/ssd_mobilenet/camera.html It allows real-time object detection on MacBook Pro: https://www.youtube.com/watch?v=XGgiDU-8d60

TrevorFSmith commented 6 years ago

@temsa CV is super important and let's definitely not let this get lost!

To get people talking, let's consolidate this conversation over in the proposals repo on Blair's Issue . If you feel that there's a separate topic that isn't covered in Blair's Issue, please read the proposals README and then open another Issue.