Visual Assistance Device

Luxonis-Brandon commented 4 years ago

Start with the `why`:

Spatial AI is starting to be possible with embedded platforms. Spatial AI provides information on what objects are, and where they are in physical space. So this information can then be transduced into other formats (text, audio, haptic/vibration, tactile display, etc.) which can help folks with visual impairments.

For example, such a spatial AI solution could provide this sort of insight when walking in a park: “There is a park bench 12 feet in front of you at 1 o’clock and it has empty seats”

Or it could find text on a page or a sign and offer to the user to read it, automatically. It could also provide insight as to where the user physically is in proximity to other objects like vehicles, people, bikes, etc., including even warning when a vehicle collision is imminent (e.g. here). Or feedback on where someone is on a path (like here).

This current effort started when Marx Melencio (who is visually impaired) reached out here, showcasing the system he has already made, and with interest in using DepthAI to actually productize.

Move to the `how`:

DepthAI is an Embedded Spatial AI platform (perhaps the only one as of this writing?) which provides neural inference (e.g. object detection) in combination with disparity depth to give object localization (i.e. what an object is and where in physical space).

And it can run series/parallel networks as well (a coming feature, see first version here), which allows a pipeline of networks to provide additional information based on the context (automatically or manually). For example, when outside/walking in a downtown area (the context) the system could automatically detect road signs/stop signs/stop lights and tell the user the state of them (which would be a cascade of a find-signs network followed by a ‘read the signs’ and/or ‘state of digital sign’ network).

So we plan to use DepthAI to make an easy-to-use visual assistance system, taking advantage of the fact that DepthAI is embedded (i.e. doesn’t require an OS or anything), low-power (i.e. can be battery powered over the course of a normal day) and is small enough to be body-worn w/out encumbrance.

There are a variety of ways we could attack such a device (a variety of potential ‘how’), which fall into two categories:

Piecing together existing hardware (largely for prototyping)
Building custom hardware (for a path to productization and selling as a standard product)

Piecing together existing hardware:

There exist a couple variants of DepthAI which could be used for such an application, and fortunately DepthAI is open-source (MIT-licensed, here), so these designs can be modified into something that is more applicable for this visual-assistance device. Below are the applicable designs and how we’ve thought to use them:

DepthAI Raspberry Pi Compute Module Edition (BW1097), here -

a. This has a whole computer built in.

b. It could be used with a GoPro adapter mount (here) and this GoPro mounting kit (here) to allow mounting practically anywhere on the body (head, chest, wrist, etc.).

c. This has all the processing and communication all built-in (it’s running Raspbian), but does not have a solution for battery, so that would need to be worked out - mainly in terms of how to mount and communicate with the battery.

d. One option would be to make a new mount that has the battery built in.
DepthAI Onboard Cameras Edition (BW1098OBC) here.

a. This also could be used with a GoPro adapter mount (here) and then connected (and powered) over USB to some other on-person computer/power source (say a Pi w/ a Pi HAT LiPo power source).

b. So this solution would have 1x USB cable going from the perception device (DepthAI BW1098OBC) to the host processor device (say a Pi with a battery).
DepthAI Modular Cameras Edition (BW1098FFC) here

a. This would allow ‘hacking together’ a prototype of smart glasses.

b. For example the main board could be on the nape of the back of the neck, connected via FFC to the cameras on smart glasses.

c. The trouble is it’s a lot of flexible flat cables (FFC) cables and these cables are relatively fragile, so it’s not ideal.

Building custom hardware:

Making actual spatial-AI glasses where everything is integrated. This picture here summarizes it. A battery would likely be integrated directly in the the frame (with the ESP32) or on a lanyard which attaches to the back of the frames.

a. The disadvantage of this is that it is specifically designed for wearing on the head… and it may be nice to for example have a head-mounted unit and a wrist-mounted device (for example the head-mounted gives situational awareness, and the wrist-mounted let’s you explore around you (e.g. read a piece of paper) without having to move your head all over

b. This is also a more complex custom design w/ some technical risk and user-experience risk including us as a team not really knowing how to make comfortable glasses/etc.
Make a small fully-integrated Spatial AI box w/ GoPro mount so it can mounted to wrist, chest, or head (using a GoPro adapter set like here) which has WiFi and BT interface.

a. This is the simplest/fastest approach, having the lowest technical risk and user experience risk as it can just be a self-contained, small system which uses field-proven GoPro mounts for attachment to head, chest, or wrist.

b. It also allows using multiple devices on a person. So for example one on the back, one on the head, one on the wrist, one on the chest. And they connect over BT or WiFi to say an on-person Raspberry Pi which handles prioritizing which data comes back based on the person interacting w/ the onboard software.

c. It is a not-huge amount of work to re-used the design for the BW1098OBC (here) while adding an ESP32 and a battery w/ charge/protection circuitry.

d. Probably worth reducing the stereo baseline from the 7.5cm there to something like 4cm or so, as it will still provide plenty of distance vision while allowing a closer-in min stereo-disparity depth-perception distance (see calculation here) of 0.367 meters (for 4cm baseline) instead of the 0.689 (for 7.5cm baseline).

So after exploring the options above, we decided that ‘2’ from custom hardware section seems like the way to go. So below in ‘what’ we describe what this device we are building:

Move to the `what`:

So out of wanting to design this, we realized we should do it in stages, so the first part of this effort will be a DepthAI ESP32 Reference Design / Embedded DepthAI Reference Design as in https://github.com/luxonis/depthai-hardware/issues/10, which will be re-usable as the core of the implementation needed below:

A “small battery-powered Spatial AI box with WiFi and BT”

Modular, self-sufficient Visual Assistance device with built-in power, WiFi, and BT interfaces.
Battery compartment for 2x 18650 3,400mAh batteries (e.g. here)
ESP32 for WiFi/BT and user-modifiable custom code.
4cm stereo baseline, 3 cameras total: stereo pair + 12MP color
Make the total package as small as reasonably possible.

Battery Notes:

Maybe for now this is direct mounted to PCB?
- Like one of those solder-mount battery holders?
If we use protected cells for now, we don’t even need to have a charger on there… just something to monitor state of charge
- Having an integrated charger would of course be cooler.
3,400mAh should run this whole design for approximately 9 hours of max-possible-performance from DepthAI (+ negligible power use of ESP32).

Camera module placement notes:

Color camera close to the ‘right’ (device’s view) stereo camera (for better object alignment between color-sensor-based inference and the ‘center of the universe’ for the stereo disparity depth result. Below is an example of a 3.5cm-baseline board we had made, with the color camera sitting on there for reference:

Example Camera Layout for Visual Assistance Device

Note that 4cm baseline was just an initial idea.
- Maybe letting another size-constraint (e.g. the battery size) determine the separation may make sense.
  - For example the 18650 batteries are 6.5cm, so the device will likely be at least 6.5 cm long, which would afford 5cm stereo baseline probably.
  - So this is something we could change around as the layout solidifies.

Mounting:

In whatever case we make for this we should:
- Have the back of it be metal to act as the structure and the heatsinking to the back. (If the whole thing is metal that’s great too)
- Make a GoPro mount on the back so that this kit or similar can be used to support wrist-mount, chest-mount, and head-mount (in addition to a bunch of other likely-possible permutations).

Luxonis-Brandon commented 3 years ago

The first hardware for this is now back and works! See https://github.com/luxonis/depthai-hardware/issues/10#issuecomment-690614759

It's small enough to be wrist or chest or head mounted with a GoPro mount (like here).

Looking forward to trying these out for visual assistance!

Luxonis-Brandon commented 3 years ago

Oh and here it is running:

Luxonis-Brandon commented 3 years ago

We recently made a version of this that has a LiPo input directly, for allowing an integrated visual assistance device.

Suhaib441 commented 2 years ago

Hello @Luxonis-Brandon i'm new to computer vision and depthai, but i was curious to know how which is better (ESP92 or BW1098FFC) if we want to integrate one of them on a smart glass like Google Glass for visual assistance?

Thank you :))

luxonis / depthai-hardware