Computer Vision for TMQ global orientation: considerations for and possible implementation

brettfiedler commented 3 years ago

Snippets taken/edited from Slack convo between BF & JG

BF: There's interest in at least exploring what it might take to implement the fiducial markers as a means of providing device orientation while CHROME lab does the same with the embedded sensors (gyroscopes/accelerometers).

BF: I think of greatest importance is considering the lift we'd be asking to pull in data from multiple sources and what would need to be communicated across the three: microcontroller/sim/marker detection.

Related repos and issues: https://github.com/phetsims/tangible Investigate other marker input strategies: https://github.com/phetsims/tangible/issues/7 Performance implications of user testing with mechamarkers: https://github.com/phetsims/ratio-and-proportion/issues/28 Will tangible input be included in the published simulation?: https://github.com/phetsims/ratio-and-proportion/issues/89

JG: My biggest question right off the bat is if you have any thoughts about https://github.com/phetsims/tangible/issues/7? It sounds like the marker strategy that was tried in RaP may not have been performant enough. Should we start with what was in RaP or just try something else?

BF: Yeah, so it was not quite enough to allow smooth movement when moving multiple objects. I'll see if I can get the demo up and running for Monday. It's possible with just one and no need for very rapid detection, it may not be so bad

BF: If we come up with enough questions, we can reach out to Peter from Ellen's lab to check on the current state of things and see what we can pull in re: rotation and tilt. They also had different sets of markers that seemed to possibly perform differently? And he had not updated in Spring 2021, but things might be different now?

JG: Just pondering internally... If we have a computer vision solution, will we still need the microcontroller?

BF: Yeah, I think this is only intended to partially solve the global orientation problem. I am absolutely positive using multiple markers will be nothing but trouble regarding constant detection/occlusion and resulting hitches in model updates for the existing Quad device from CHROME (based on the form factor and how a user moves their hands around the device). But, if we are only relying on ~1 to tell us if something is positioned upright or not, it may not matter too much. Of note, if we introduce any markers at all we will have to accept there will be some moments of desync between the visual display and what the learner is doing whenever detection is lost (be it occlusion, tilt, or goblins).

We should consider the implications of both scenarios:

Using a marker to give global rotation reference and
using a marker (or two..?) to give absolute position reference (e.g., telling the different between pulling with right/left hand or both hands), since those are both things people are interested in.

Tagging @zepumph, since he has worked most extensively, in case there are any nuggets of wisdom to share after talking to @jessegreenberg .

@BLFiedler & @jessegreenberg meeting on 11/1 to discuss.

jessegreenberg commented 3 years ago

@zepumph helped me set up tracking using MarkerInput.js and it worked really well out of the box. It seems like orientation information for a marker is readily available and the data comes in quick. I could easily see adding support for 1) with this method. I imagine 2) could be done as well, but maybe it will present challenges like the ones mentioned in phetsims/tangible#7

brettfiedler commented 3 years ago

One marker anchored to the 'top' of the device can provide rotation information for that piece and update the visual display. The visual display currently uses the 'top' side as the reference for which the rest of the vertices move around, which causes some funny visual rotations when the top vertices move out of the horizontal plane.
It's possible one marker anchored to a vertex coupled to the length/angle information from the microcontroller, COULD provide enough information to give translation/global positioning? e.g., the lack of movement of the marker with incoming length change data on the sides may let us know that the shape expanded out asymmetrically. Might need to try to implement or see if this would be substantially improved by one other marker.
The possibility of marker redundancy was discussed. With, e.g., 4 markers we could prioritize them and when one or more become occluded, the sim looks for the next one for positioning information and reassigns the anchor.

brettfiedler commented 3 years ago

Performance concerns to be investigated before/during implementation (marker detection, not sim performance):

I demoed the tracking in-browser for JG on a version of the marker detection MK created previously. The marker ID is easily lost at relatively slow movement speeds. More expected, it is also lost as my webcam tries to auto-focus (EDIT: disabled this as well as some other automatic corrections and does not appear to improve the movement tracking).
We will need to optimize marker detection for tilt in and out of the detection plane for cases where a learner picks up the device toward their person.
- The visibility of the marker when moving in and out of the plane of the table can be improved by positioning the camera over the shoulder/slightly behind the learner (rather than straight down or angle down in front as you would get with a laptop webcam). Will still need to make sure the marker detection is robust to tilt and possibly (?) robust to a changing perspective relative to the camera (e.g., brought closer/farther away). The latter could be a constraint placed on the user (e.g., "please keep the device at approximately arms length")

brettfiedler commented 3 years ago

Looping in @emily-phet as an FYI ahead of meeting on Tuesday

brettfiedler commented 2 years ago

Regarding Color tracking:

JG: I was playing around with OpenCV and found a way to track a color with a webcam from the browser. I had good luck watching a red rectangle taped to the quad and then calculating its rotation. It seems less vulnerable to motion bluring since it is just watching colors. I don't know if this is something to actually employ, but it is in our back pocket. Heres a demo:

opencv-test (1)

[Brett Fiedler] Is the color choice arbitrary? I suspect that bright green folks use for green screens is a rare enough color.

[Jesse Greenberg] Sounds good! Yes, color is arbitrary. Hah, that makes sense! To get red working I had to do a lot of filtering to ignore my skin... It looks like opencv provides a built-in way to get the perspective transform of an object. It also looks like there is a built-in way to detect lines in an image and extend them as if they were not occluded. Seems pretty strong!

brettfiedler commented 2 years ago

We'll move forward with OpenCV for marker tracking. Beholder is not intended for robust motion tracking (deblurring).

@jessegreenberg will implement and we will figure out how far we can get with single marker (global rotation) and multiple marker tracking (vertex tracking) in the context of the quadrilateral

jessegreenberg commented 2 years ago

I got a pretty consistent (much better than https://github.com/phetsims/quadrilateral/issues/20#issuecomment-1048043791) angle tracking working by watching two green rectangles, finding the centers of their contours, and then determining the angle of the line between them. This gets around the issue of not knowing the relative orientation of a single rectangle (which could go back to zero degrees every 90 degrees). The green works better than red to pick filter out in the image.

![image](https://user-images.githubusercontent.com/6396244/155454485-9099a139-8b06-4da6-b295-3d06754c3800.png)

jessegreenberg commented 2 years ago

I connected the above to the sim, its not too bad at all!

rts (1)

EDIT: test code for this:

```html OPENCV TEST

Min h: Min s: Min v: max h: max s: max v:

```

jessegreenberg commented 2 years ago

Next, we should try tracking four markers that would control four vertex positions defining the quadrilateral. Over slack @BLFiedler suggested that they could be different colors so that we know how to identify them. We could probably get pretty far without distinguishing each marker with coloring, just reassigning the left most and right most vertex to left most and right most markers. Or we could have different sized markers to label them.

I also want to try using a "line detection" approach that may work with any kind of hand occlusion. We could detect the lines of the TMQ, extend them all the way to the edge of the image, find line intersection points, and those would be the locations of our vertices. If any portion of a side is visible we will see vertex positions. https://www.geeksforgeeks.org/line-detection-python-opencv-houghline-method

EDIT: Here is another document for hough line detection: https://docs.opencv.org/3.4/d3/de6/tutorial_js_houghlines.html

jessegreenberg commented 2 years ago

Trying out Hough Line Transform approach:

Starting with this image:

![download](https://user-images.githubusercontent.com/6396244/155609802-31c67945-891d-4f87-9096-17f7fbbe9296.png)

Lines like this can be detected:

![download (1)](https://user-images.githubusercontent.com/6396244/155609841-c665051b-65ba-47b4-a5d4-ae82e5818d98.png)

With this opencv snippet:

```js let src = cv.imread('canvasInput'); let dst = cv.Mat.zeros(src.rows, src.cols, cv.CV_8UC3); let lines = new cv.Mat(); cv.cvtColor(src, src, cv.COLOR_RGBA2GRAY, 0); cv.Canny(src, src, 50, 200, 3); // You can try more different parameters cv.HoughLines(src, lines, 1, Math.PI / 180, 50, 0, 0, 0, Math.PI); // draw lines for (let i = 0; i < lines.rows; ++i) { let rho = lines.data32F[i * 2]; let theta = lines.data32F[i * 2 + 1]; let a = Math.cos(theta); let b = Math.sin(theta); let x0 = a * rho; let y0 = b * rho; let startPoint = {x: x0 - 1000 * b, y: y0 + 1000 * a}; let endPoint = {x: x0 + 1000 * b, y: y0 - 1000 * a}; cv.line(dst, startPoint, endPoint, [255, 0, 0, 255]); } cv.imshow('canvasOutput', dst); src.delete(); dst.delete(); lines.delete(); ```

An example of how this could work with occlusion. My hands are covering two vertices entirely but it is able to find the sides.

![image](https://user-images.githubusercontent.com/6396244/155622842-93343763-1b5d-48bb-a91b-49d43ade1556.png)

Here I was able to find the intersection points of lines that are not of equivalent slope:

![image](https://user-images.githubusercontent.com/6396244/155753538-ec18673b-bffe-4cc6-9ffa-479d572a275b.png)

Maybe I can use k-means clustering to find the centers of each vertex, opencv has a function to do so. Or use morphological operations on that image to create blobs and then contours around clusters of points. Or maybe a different averaging solution.

I got close with kmeans I think but ran out of time. Here is code with a TODO for next time.

```js OPENCV TEST

Min h: Min s: Min v: max h: max s: max v:

```

kmeans seems overly complicated at this point, I am going to turn each of those blobs into a countour and find the center. I tried an "open" operation but it seems to reduce the framerate substantially:

        cv.morphologyEx( tempMat, newMat, cv.MORPH_OPEN, Ma, anchor, 1, cv.BORDER_CONSTANT, cv.morphologyDefaultBorderValue() );

Instead, I am just going to create large circles at the intersection points so it looks like a big connected blob

OK, here it is altogether:

![rts](https://user-images.githubusercontent.com/6396244/155804635-47f9d47b-fe9d-42d5-abc6-3d04bab3f2ac.gif)

There is a fair amount of jitter because the lines are unstable. I think a lot of it is coming from the canny edge detection that happens first, look at all this noise:

![rts](https://user-images.githubusercontent.com/6396244/155809602-b57d306d-efbb-4ecf-9caf-6c007b55617b.gif)

It is coming from noise in the initial color filter.

![rts](https://user-images.githubusercontent.com/6396244/155810390-81c903bf-3c56-45f8-8819-3e1a443927bc.gif)

Hmm, "convex hull" may be what I want to get something more stable. It isn't really any better. I am trying to find a way to get the "skeleton" of the pixels displayed so there is only a single line but I am not having good luck.

Ooo, there is a fitLine function...cv.fitLine(cnt, cv.DIST_L2,0,0.01,0.01) But it wouls still require identifying regions of sides.

approxPolyDP may be what we need:

![image](https://user-images.githubusercontent.com/6396244/155817760-9ece42ba-c963-4e63-8728-48959d4e6231.png)

approxPolyDB might give us an access to straight lines without noise:

![image](https://user-images.githubusercontent.com/6396244/155865046-1e5f7e08-93f2-4768-bcfd-64589c023c9e.png) ```js let src = cv.imread('canvasInput'); let dst = cv.Mat.zeros(src.rows, src.cols, cv.CV_8UC3); cv.cvtColor(src, src, cv.COLOR_RGBA2GRAY, 0); cv.threshold(src, src, 100, 200, cv.THRESH_BINARY); let contours = new cv.MatVector(); let hierarchy = new cv.Mat(); let poly = new cv.MatVector(); cv.findContours(src, contours, hierarchy, cv.RETR_CCOMP, cv.CHAIN_APPROX_SIMPLE); // approximates each contour to polygon for (let i = 0; i < contours.size(); ++i) { let tmp = new cv.Mat(); let cnt = contours.get(i); // You can try more different parameters cv.approxPolyDP(cnt, tmp, 15, true); poly.push_back(tmp); cnt.delete(); tmp.delete(); } // draw contours with random Scalar for (let i = 0; i < contours.size(); ++i) { let color = new cv.Scalar(Math.round(Math.random() * 255), Math.round(Math.random() * 255), Math.round(Math.random() * 255)); cv.drawContours(dst, poly, i, color, 1, 8, hierarchy, 0); } cv.imshow('canvasOutput', dst); src.delete(); dst.delete(); hierarchy.delete(); contours.delete(); poly.delete(); ```

Final code before switching to a four marker solution

```html OPENCV TEST

Min h: Min s: Min v: max h: max s: max v:

```

jessegreenberg commented 2 years ago

Discussed with @BLFiedler at a check-in meeting. We like the idea of line tracking but lets put that on hold for now.

Multiple colored markers vs different sized markers
- Lets try size first, different colors. Maybe a combination will work well we have two colors with two different sizes. But we do worry about perspective changing sizes and causing bad data...
- We like the idea of 4 different shapes, one for each corner. Four different shapes would let us identify the corners, but let us use size for perspective. This might not work if there is motion blur...
Sense of scale/perspective relative to camera

EDIT: I would like to first play with marker size to accomplish this because it would be easiest. Keeping in mind I think we can pretty quickly change things to support just about anything listed here. The idea is that we could have markers of varying length. Then the height of each marker could still be used to determine perspective if we want.

jessegreenberg commented 2 years ago

Notes as I work on a solution that uses 4 discrete markers. Overall, there is hardly any noise and it feels really fast. But it is of course more susceptible to marker occlusion.

I made substantial progress on this today, here is my hacky code:

```html OPENCV TEST

Min h: Min s: Min v: max h: max s: max v:

```

Demonstration of the behavior, with detected positions controlling the sim:

![ezgif com-gif-maker](https://user-images.githubusercontent.com/6396244/157334659-09f899d0-711e-4e38-9bba-0268dd49ae4a.gif)

Don't have labelled vertices or something resilient to perspective figured out yet but I think that seems relatively straight forward to work on next.

jessegreenberg commented 2 years ago

Discussed status with @BLFiedler, over slack he mentioned two things that would be good to have next 1) A shareable version so the team can try it and determine what should be worked on next. 2) A way to flip the camera feed so that it will work if the camera is over the shoulder instead of pointing toward user's face.

EDIT: For next time, cv has built in functions to flip an image vertically: cv.flip(image, image, 0); horizontally: cv.flip(image, image, +1);

brettfiedler commented 2 years ago

I think we'll need to support both horizontal and vertical flip? I made a quick video about possible detection window orientations with respect to the marker locations

https://user-images.githubusercontent.com/47331395/157487817-60b42ae0-6a97-4400-8af6-fbbd729b3a4f.mp4

terracoda commented 2 years ago

These are really interesting perspectives on the device. I have questions about how to consistently start the description.

brettfiedler commented 2 years ago

When a shareable version is ready, let's keep a version with the non-identified vertices (relabels when the shape rotates)

Next step will be adding vertex identification to enable rotation of the quad while keeping the same vertex from startup

brettfiedler commented 2 years ago

Played around with small squares affixed to the TMQ as well as free moving green blocks. I mounted my webcam on the ceiling above me (sloped ceiling). https://phet-dev.colorado.edu/html/jg-tests/opencv-test/

Setup notes:

My mouse has a green LED. Had to make sure it wasn't in view.
I tried vertical for a bit but it was pretty difficult to keep the quad in view and still have it detected.
The horizontal/vertical flip made changing the orientation very easy. Just looked for the camera feed that looked like my perspective.
I turned off my cameras autofocus for video 2.

1.) SUPER FUN. Amazing how much we can already do with it once it's set up. Quite smooth (with the exception of the below notes) when the parameters are dialed in.

2.) A few videos I took playing around with the current setup.

As expected, detection is quite dependent on lighting conditions. I decreased min S and min H until other stray objects appeared, then bumped it back up a bit. Might be good to figure out how to support setup for nonvisual users, whether a guide or verbal instructions? Maybe a voiced indicator like "4 markers detected. 3 markers detected". To let someone know the shape is not getting picked up and will not behave properly.
Detection jitter around all-side-equal and all-angle-equal is a bit frenzied as it enters and leaves the tolerance interval.
A small smoothing algorithm of a limited number of prior data points will help, but will also need to implement #116 , likely for all tolerance intervals?
Lack of marker-to-vertex identity adds some funny behavior when rotating.
Will need to make sure the case of a concave shape or swapping vertices is handled correctly (and not mistaken for each other).
When markers are close to each other, they merge (when red boxes touch)
How to elegantly handle loss of a marker or bad data
- Possibly tied to random vertex positioning when the TMQ is still in video 1.
Might be nice to add a "Reset to Default" for the HSV filter values. I found myself just refreshing the page.
Q: Should we consider absolute positioning so we can translate around the play area? (Tentative A: I'm not sure it's necessary, but it is a possibility I believe)

Video 1:

https://user-images.githubusercontent.com/47331395/164247166-e02854fc-27e8-42ff-a8c4-fba2d06d06ad.mp4

Video 2:

https://user-images.githubusercontent.com/47331395/164247172-7277c29e-2830-4bce-a7d4-edb79e41d4f4.mp4

emily-phet commented 2 years ago

@BLFiedler Sounds very cool! I can't seem to get the videos to load... is anyone else having this problem?

brettfiedler commented 2 years ago

I tried changing the formatting of the post above which made the embedding show up. Let me know if that fixes it. Otherwise, I've put the videos here, though they may take a bit to process: https://drive.google.com/drive/folders/1zwKRagycbptEeRXa3AhiuEQ0CeUCVsvh?usp=sharing

terracoda commented 2 years ago

The corner demo is so cool @BLFiedler!

Do you know what is happening in the TMQ demo? It jitters without movement?

brettfiedler commented 2 years ago

Repost of my above comments with some additional details taken while chatting with @jessegreenberg . Includes plans for new issues to prioritize after Voicing:

As expected, detection is quite dependent on lighting conditions. I decreased min S and min H until other stray objects appeared, then bumped it back up a bit. Might be good to figure out how to support setup for nonvisual users, whether a guide or verbal instructions? Maybe a voiced indicator like "4 markers detected. 3 markers detected". To let someone know the shape is not getting picked up and will not behave properly.
- We might be able to autodetect the green based on HSV (and auto set the ranges for each value) with a manual override possibility or let a user pick the color to help with user setup.

[x] [NEW ISSUE] Detection jitter around all-side-equal and all-angle-equal is a bit frenzied as it enters and leaves the tolerance interval.
- Will look at tackling it this way: https://www.arduino.cc/en/Tutorial/BuiltInExamples/Smoothing
[x] [EXISTING ISSUE #116] A small smoothing algorithm of a limited number of prior data points will help, but will also need to implement #116 , likely for all tolerance intervals?
[x] [NEW ISSUE] Lack of marker-to-vertex identity adds some funny behavior when rotating.
- We’ll try width so we can do some perspective adjustment (for tilt first and foremost).
- This will likely require two calibration steps (close and far away).
Will need to make sure the case of a concave shape or swapping vertices is handled correctly (and not mistaken for each other).
- [x] [PRIORITY NEW ISSUE] Let’s switch to absolute positioning to help solve this issue, rather than using the same Quadrilateral modeling as the TMQ which has length and angle data coming in from the sensors. Will need to calibrate to the detection window. Let’s allow translating the vertices around the play area (rather than centered like the TMQ currently is), so that we can take advantage of the play area bound sounds when they are implemented.
[ ] [Lower priority NEW ISSUE] When markers are close to each other, they merge (when red boxes touch) - want to avoid this behavior if possible
[ ] [NEW ISSUE] How to elegantly handle loss of a marker or bad data
- Retaining last known data? Updating when detection returns. Somehow avoiding large impossible jumps due to detection uncertainty. (“That rate of change was too fast! Please ignore.”)
[ ] [NEW ISSUE] Might be nice to add a "Reset to Default" for the HSV filter values. I found myself just refreshing the page.
[ ] [NEW ISSUE] Testing setup: OpenCV just in the environment, not in the sim at all. What do we want with regards to the controls and video feed embedded directly into the simulation (pref menu?) - This will impact what we do for RaP as well.

emily-phet commented 2 years ago

Cool videos! I think this shows lots of potential, particularly with the four blocks...

brettfiedler commented 2 years ago

Updating needs for OpenCV issues from

We might be able to autodetect the green based on HSV (and auto-set the ranges for each value) with a manual override possibility or let a user pick the color to help with user setup.

This was done as part of JGs tests - currently being further developed in: https://github.com/phetsims/quadrilateral/issues/141

[Lower priority NEW ISSUE] When markers are close to each other, they merge (when red boxes touch) - want to avoid this behavior if possible

This shouldn't be an issue when using 4 distinctly colored markers: https://github.com/phetsims/quadrilateral/issues/141

[NEW ISSUE] How to elegantly handle loss of a marker or bad data

Also to be worked on in https://github.com/phetsims/quadrilateral/issues/141 as part of marker differentiation.

[NEW ISSUE] Might be nice to add a "Reset to Default" for the HSV filter values. I found myself just refreshing the page.

Creating new issue that retains last used values from browser cache.

[NEW ISSUE] Testing setup: OpenCV just in the environment, not in the sim at all. What do we want with regards to the controls and video feed embedded directly into the simulation (pref menu?) - This will impact what we do for RaP as well.

On hold for now - current interface usable with PhET-iO and the sim can be full screen to hide the interface. Hiding this in a menu will make setup difficult.

brettfiedler commented 2 years ago

For current needs, this is complete.

phetsims / quadrilateral

Computer Vision for TMQ global orientation: considerations for and possible implementation #20