phetsims / quadrilateral

"Quadrilateral" is an educational simulation in HTML5, by PhET Interactive Simulations.
GNU General Public License v3.0
1 stars 3 forks source link

Computer Vision for TMQ global orientation: considerations for and possible implementation #20

Closed brettfiedler closed 2 years ago

brettfiedler commented 3 years ago

Snippets taken/edited from Slack convo between BF & JG

BF: There's interest in at least exploring what it might take to implement the fiducial markers as a means of providing device orientation while CHROME lab does the same with the embedded sensors (gyroscopes/accelerometers).

BF: I think of greatest importance is considering the lift we'd be asking to pull in data from multiple sources and what would need to be communicated across the three: microcontroller/sim/marker detection.

Related repos and issues: https://github.com/phetsims/tangible Investigate other marker input strategies: https://github.com/phetsims/tangible/issues/7 Performance implications of user testing with mechamarkers: https://github.com/phetsims/ratio-and-proportion/issues/28 Will tangible input be included in the published simulation?: https://github.com/phetsims/ratio-and-proportion/issues/89

JG: My biggest question right off the bat is if you have any thoughts about https://github.com/phetsims/tangible/issues/7? It sounds like the marker strategy that was tried in RaP may not have been performant enough. Should we start with what was in RaP or just try something else?

BF: Yeah, so it was not quite enough to allow smooth movement when moving multiple objects. I'll see if I can get the demo up and running for Monday. It's possible with just one and no need for very rapid detection, it may not be so bad

BF: If we come up with enough questions, we can reach out to Peter from Ellen's lab to check on the current state of things and see what we can pull in re: rotation and tilt. They also had different sets of markers that seemed to possibly perform differently? And he had not updated in Spring 2021, but things might be different now?

JG: Just pondering internally... If we have a computer vision solution, will we still need the microcontroller?

BF: Yeah, I think this is only intended to partially solve the global orientation problem. I am absolutely positive using multiple markers will be nothing but trouble regarding constant detection/occlusion and resulting hitches in model updates for the existing Quad device from CHROME (based on the form factor and how a user moves their hands around the device). But, if we are only relying on ~1 to tell us if something is positioned upright or not, it may not matter too much. Of note, if we introduce any markers at all we will have to accept there will be some moments of desync between the visual display and what the learner is doing whenever detection is lost (be it occlusion, tilt, or goblins).

We should consider the implications of both scenarios:

  1. Using a marker to give global rotation reference and
  2. using a marker (or two..?) to give absolute position reference (e.g., telling the different between pulling with right/left hand or both hands), since those are both things people are interested in.

Tagging @zepumph, since he has worked most extensively, in case there are any nuggets of wisdom to share after talking to @jessegreenberg .

@BLFiedler & @jessegreenberg meeting on 11/1 to discuss.

jessegreenberg commented 3 years ago

@zepumph helped me set up tracking using MarkerInput.js and it worked really well out of the box. It seems like orientation information for a marker is readily available and the data comes in quick. I could easily see adding support for 1) with this method. I imagine 2) could be done as well, but maybe it will present challenges like the ones mentioned in phetsims/tangible#7

brettfiedler commented 3 years ago
brettfiedler commented 3 years ago

Performance concerns to be investigated before/during implementation (marker detection, not sim performance):

brettfiedler commented 3 years ago

Looping in @emily-phet as an FYI ahead of meeting on Tuesday

brettfiedler commented 2 years ago

Regarding Color tracking:

JG: I was playing around with OpenCV and found a way to track a color with a webcam from the browser. I had good luck watching a red rectangle taped to the quad and then calculating its rotation. It seems less vulnerable to motion bluring since it is just watching colors. I don't know if this is something to actually employ, but it is in our back pocket. Heres a demo:

opencv-test (1)

[Brett Fiedler] Is the color choice arbitrary? I suspect that bright green folks use for green screens is a rare enough color.

[Jesse Greenberg] Sounds good! Yes, color is arbitrary. Hah, that makes sense! To get red working I had to do a lot of filtering to ignore my skin... It looks like opencv provides a built-in way to get the perspective transform of an object. It also looks like there is a built-in way to detect lines in an image and extend them as if they were not occluded. Seems pretty strong!

brettfiedler commented 2 years ago

We'll move forward with OpenCV for marker tracking. Beholder is not intended for robust motion tracking (deblurring).

@jessegreenberg will implement and we will figure out how far we can get with single marker (global rotation) and multiple marker tracking (vertex tracking) in the context of the quadrilateral

jessegreenberg commented 2 years ago

I got a pretty consistent (much better than https://github.com/phetsims/quadrilateral/issues/20#issuecomment-1048043791) angle tracking working by watching two green rectangles, finding the centers of their contours, and then determining the angle of the line between them. This gets around the issue of not knowing the relative orientation of a single rectangle (which could go back to zero degrees every 90 degrees). The green works better than red to pick filter out in the image.

![image](https://user-images.githubusercontent.com/6396244/155454485-9099a139-8b06-4da6-b295-3d06754c3800.png)
jessegreenberg commented 2 years ago

I connected the above to the sim, its not too bad at all!

rts (1)

EDIT: test code for this:

```html OPENCV TEST
```
jessegreenberg commented 2 years ago

Next, we should try tracking four markers that would control four vertex positions defining the quadrilateral. Over slack @BLFiedler suggested that they could be different colors so that we know how to identify them. We could probably get pretty far without distinguishing each marker with coloring, just reassigning the left most and right most vertex to left most and right most markers. Or we could have different sized markers to label them.

I also want to try using a "line detection" approach that may work with any kind of hand occlusion. We could detect the lines of the TMQ, extend them all the way to the edge of the image, find line intersection points, and those would be the locations of our vertices. If any portion of a side is visible we will see vertex positions. https://www.geeksforgeeks.org/line-detection-python-opencv-houghline-method

EDIT: Here is another document for hough line detection: https://docs.opencv.org/3.4/d3/de6/tutorial_js_houghlines.html

jessegreenberg commented 2 years ago

Trying out Hough Line Transform approach:

Starting with this image:

![download](https://user-images.githubusercontent.com/6396244/155609802-31c67945-891d-4f87-9096-17f7fbbe9296.png)

Lines like this can be detected:

![download (1)](https://user-images.githubusercontent.com/6396244/155609841-c665051b-65ba-47b4-a5d4-ae82e5818d98.png)

With this opencv snippet:

```js let src = cv.imread('canvasInput'); let dst = cv.Mat.zeros(src.rows, src.cols, cv.CV_8UC3); let lines = new cv.Mat(); cv.cvtColor(src, src, cv.COLOR_RGBA2GRAY, 0); cv.Canny(src, src, 50, 200, 3); // You can try more different parameters cv.HoughLines(src, lines, 1, Math.PI / 180, 50, 0, 0, 0, Math.PI); // draw lines for (let i = 0; i < lines.rows; ++i) { let rho = lines.data32F[i * 2]; let theta = lines.data32F[i * 2 + 1]; let a = Math.cos(theta); let b = Math.sin(theta); let x0 = a * rho; let y0 = b * rho; let startPoint = {x: x0 - 1000 * b, y: y0 + 1000 * a}; let endPoint = {x: x0 + 1000 * b, y: y0 - 1000 * a}; cv.line(dst, startPoint, endPoint, [255, 0, 0, 255]); } cv.imshow('canvasOutput', dst); src.delete(); dst.delete(); lines.delete(); ```

An example of how this could work with occlusion. My hands are covering two vertices entirely but it is able to find the sides.

![image](https://user-images.githubusercontent.com/6396244/155622842-93343763-1b5d-48bb-a91b-49d43ade1556.png)

Here I was able to find the intersection points of lines that are not of equivalent slope:

![image](https://user-images.githubusercontent.com/6396244/155753538-ec18673b-bffe-4cc6-9ffa-479d572a275b.png)

Maybe I can use k-means clustering to find the centers of each vertex, opencv has a function to do so. Or use morphological operations on that image to create blobs and then contours around clusters of points. Or maybe a different averaging solution.

I got close with kmeans I think but ran out of time. Here is code with a TODO for next time.

```js OPENCV TEST
```

kmeans seems overly complicated at this point, I am going to turn each of those blobs into a countour and find the center. I tried an "open" operation but it seems to reduce the framerate substantially:

        cv.morphologyEx( tempMat, newMat, cv.MORPH_OPEN, Ma, anchor, 1, cv.BORDER_CONSTANT, cv.morphologyDefaultBorderValue() );

Instead, I am just going to create large circles at the intersection points so it looks like a big connected blob

OK, here it is altogether:

![rts](https://user-images.githubusercontent.com/6396244/155804635-47f9d47b-fe9d-42d5-abc6-3d04bab3f2ac.gif)

There is a fair amount of jitter because the lines are unstable. I think a lot of it is coming from the canny edge detection that happens first, look at all this noise:

![rts](https://user-images.githubusercontent.com/6396244/155809602-b57d306d-efbb-4ecf-9caf-6c007b55617b.gif)

It is coming from noise in the initial color filter.

![rts](https://user-images.githubusercontent.com/6396244/155810390-81c903bf-3c56-45f8-8819-3e1a443927bc.gif)

Hmm, "convex hull" may be what I want to get something more stable. It isn't really any better. I am trying to find a way to get the "skeleton" of the pixels displayed so there is only a single line but I am not having good luck.

Ooo, there is a fitLine function...cv.fitLine(cnt, cv.DIST_L2,0,0.01,0.01) But it wouls still require identifying regions of sides.

approxPolyDP may be what we need:

![image](https://user-images.githubusercontent.com/6396244/155817760-9ece42ba-c963-4e63-8728-48959d4e6231.png)

approxPolyDB might give us an access to straight lines without noise:

![image](https://user-images.githubusercontent.com/6396244/155865046-1e5f7e08-93f2-4768-bcfd-64589c023c9e.png) ```js let src = cv.imread('canvasInput'); let dst = cv.Mat.zeros(src.rows, src.cols, cv.CV_8UC3); cv.cvtColor(src, src, cv.COLOR_RGBA2GRAY, 0); cv.threshold(src, src, 100, 200, cv.THRESH_BINARY); let contours = new cv.MatVector(); let hierarchy = new cv.Mat(); let poly = new cv.MatVector(); cv.findContours(src, contours, hierarchy, cv.RETR_CCOMP, cv.CHAIN_APPROX_SIMPLE); // approximates each contour to polygon for (let i = 0; i < contours.size(); ++i) { let tmp = new cv.Mat(); let cnt = contours.get(i); // You can try more different parameters cv.approxPolyDP(cnt, tmp, 15, true); poly.push_back(tmp); cnt.delete(); tmp.delete(); } // draw contours with random Scalar for (let i = 0; i < contours.size(); ++i) { let color = new cv.Scalar(Math.round(Math.random() * 255), Math.round(Math.random() * 255), Math.round(Math.random() * 255)); cv.drawContours(dst, poly, i, color, 1, 8, hierarchy, 0); } cv.imshow('canvasOutput', dst); src.delete(); dst.delete(); hierarchy.delete(); contours.delete(); poly.delete(); ```

Final code before switching to a four marker solution

```html OPENCV TEST
```
jessegreenberg commented 2 years ago

Discussed with @BLFiedler at a check-in meeting. We like the idea of line tracking but lets put that on hold for now.

EDIT: I would like to first play with marker size to accomplish this because it would be easiest. Keeping in mind I think we can pretty quickly change things to support just about anything listed here. The idea is that we could have markers of varying length. Then the height of each marker could still be used to determine perspective if we want.

jessegreenberg commented 2 years ago

Notes as I work on a solution that uses 4 discrete markers. Overall, there is hardly any noise and it feels really fast. But it is of course more susceptible to marker occlusion.

I made substantial progress on this today, here is my hacky code:

```html OPENCV TEST
```

Demonstration of the behavior, with detected positions controlling the sim:

![ezgif com-gif-maker](https://user-images.githubusercontent.com/6396244/157334659-09f899d0-711e-4e38-9bba-0268dd49ae4a.gif)

Don't have labelled vertices or something resilient to perspective figured out yet but I think that seems relatively straight forward to work on next.

jessegreenberg commented 2 years ago

Discussed status with @BLFiedler, over slack he mentioned two things that would be good to have next 1) A shareable version so the team can try it and determine what should be worked on next. 2) A way to flip the camera feed so that it will work if the camera is over the shoulder instead of pointing toward user's face.

EDIT: For next time, cv has built in functions to flip an image vertically: cv.flip(image, image, 0); horizontally: cv.flip(image, image, +1);

brettfiedler commented 2 years ago

I think we'll need to support both horizontal and vertical flip? I made a quick video about possible detection window orientations with respect to the marker locations

https://user-images.githubusercontent.com/47331395/157487817-60b42ae0-6a97-4400-8af6-fbbd729b3a4f.mp4

terracoda commented 2 years ago

These are really interesting perspectives on the device. I have questions about how to consistently start the description.

brettfiedler commented 2 years ago

When a shareable version is ready, let's keep a version with the non-identified vertices (relabels when the shape rotates)

Next step will be adding vertex identification to enable rotation of the quad while keeping the same vertex from startup

brettfiedler commented 2 years ago

Played around with small squares affixed to the TMQ as well as free moving green blocks. I mounted my webcam on the ceiling above me (sloped ceiling). https://phet-dev.colorado.edu/html/jg-tests/opencv-test/

Setup notes:

1.) SUPER FUN. Amazing how much we can already do with it once it's set up. Quite smooth (with the exception of the below notes) when the parameters are dialed in.

2.) A few videos I took playing around with the current setup.

Video 1:

https://user-images.githubusercontent.com/47331395/164247166-e02854fc-27e8-42ff-a8c4-fba2d06d06ad.mp4

Video 2:

https://user-images.githubusercontent.com/47331395/164247172-7277c29e-2830-4bce-a7d4-edb79e41d4f4.mp4

emily-phet commented 2 years ago

@BLFiedler Sounds very cool! I can't seem to get the videos to load... is anyone else having this problem?

brettfiedler commented 2 years ago

I tried changing the formatting of the post above which made the embedding show up. Let me know if that fixes it. Otherwise, I've put the videos here, though they may take a bit to process: https://drive.google.com/drive/folders/1zwKRagycbptEeRXa3AhiuEQ0CeUCVsvh?usp=sharing

terracoda commented 2 years ago

The corner demo is so cool @BLFiedler!

Do you know what is happening in the TMQ demo? It jitters without movement?

brettfiedler commented 2 years ago

Repost of my above comments with some additional details taken while chatting with @jessegreenberg . Includes plans for new issues to prioritize after Voicing:

image

emily-phet commented 2 years ago

Cool videos! I think this shows lots of potential, particularly with the four blocks...

brettfiedler commented 2 years ago

Updating needs for OpenCV issues from

  • We might be able to autodetect the green based on HSV (and auto-set the ranges for each value) with a manual override possibility or let a user pick the color to help with user setup.

This was done as part of JGs tests - currently being further developed in: https://github.com/phetsims/quadrilateral/issues/141

  • [Lower priority NEW ISSUE] When markers are close to each other, they merge (when red boxes touch) - want to avoid this behavior if possible

This shouldn't be an issue when using 4 distinctly colored markers: https://github.com/phetsims/quadrilateral/issues/141

  • [NEW ISSUE] How to elegantly handle loss of a marker or bad data

Also to be worked on in https://github.com/phetsims/quadrilateral/issues/141 as part of marker differentiation.

  • [NEW ISSUE] Might be nice to add a "Reset to Default" for the HSV filter values. I found myself just refreshing the page.

Creating new issue that retains last used values from browser cache.

  • [NEW ISSUE] Testing setup: OpenCV just in the environment, not in the sim at all. What do we want with regards to the controls and video feed embedded directly into the simulation (pref menu?) - This will impact what we do for RaP as well.

On hold for now - current interface usable with PhET-iO and the sim can be full screen to hide the interface. Hiding this in a menu will make setup difficult.

brettfiedler commented 2 years ago

For current needs, this is complete.