Closed brettfiedler closed 2 years ago
@zepumph helped me set up tracking using MarkerInput.js and it worked really well out of the box. It seems like orientation information for a marker is readily available and the data comes in quick. I could easily see adding support for 1) with this method. I imagine 2) could be done as well, but maybe it will present challenges like the ones mentioned in phetsims/tangible#7
Performance concerns to be investigated before/during implementation (marker detection, not sim performance):
Looping in @emily-phet as an FYI ahead of meeting on Tuesday
Regarding Color tracking:
JG: I was playing around with OpenCV and found a way to track a color with a webcam from the browser. I had good luck watching a red rectangle taped to the quad and then calculating its rotation. It seems less vulnerable to motion bluring since it is just watching colors. I don't know if this is something to actually employ, but it is in our back pocket. Heres a demo:
[Brett Fiedler] Is the color choice arbitrary? I suspect that bright green folks use for green screens is a rare enough color.
[Jesse Greenberg] Sounds good! Yes, color is arbitrary. Hah, that makes sense! To get red working I had to do a lot of filtering to ignore my skin... It looks like opencv provides a built-in way to get the perspective transform of an object. It also looks like there is a built-in way to detect lines in an image and extend them as if they were not occluded. Seems pretty strong!
We'll move forward with OpenCV for marker tracking. Beholder is not intended for robust motion tracking (deblurring).
@jessegreenberg will implement and we will figure out how far we can get with single marker (global rotation) and multiple marker tracking (vertex tracking) in the context of the quadrilateral
I got a pretty consistent (much better than https://github.com/phetsims/quadrilateral/issues/20#issuecomment-1048043791) angle tracking working by watching two green rectangles, finding the centers of their contours, and then determining the angle of the line between them. This gets around the issue of not knowing the relative orientation of a single rectangle (which could go back to zero degrees every 90 degrees). The green works better than red to pick filter out in the image.
I connected the above to the sim, its not too bad at all!
EDIT: test code for this:
Next, we should try tracking four markers that would control four vertex positions defining the quadrilateral. Over slack @BLFiedler suggested that they could be different colors so that we know how to identify them. We could probably get pretty far without distinguishing each marker with coloring, just reassigning the left most and right most vertex to left most and right most markers. Or we could have different sized markers to label them.
I also want to try using a "line detection" approach that may work with any kind of hand occlusion. We could detect the lines of the TMQ, extend them all the way to the edge of the image, find line intersection points, and those would be the locations of our vertices. If any portion of a side is visible we will see vertex positions. https://www.geeksforgeeks.org/line-detection-python-opencv-houghline-method
EDIT: Here is another document for hough line detection: https://docs.opencv.org/3.4/d3/de6/tutorial_js_houghlines.html
Trying out Hough Line Transform approach:
Starting with this image:
Lines like this can be detected:
With this opencv snippet:
An example of how this could work with occlusion. My hands are covering two vertices entirely but it is able to find the sides.
Here I was able to find the intersection points of lines that are not of equivalent slope:
Maybe I can use k-means clustering to find the centers of each vertex, opencv has a function to do so. Or use morphological operations on that image to create blobs and then contours around clusters of points. Or maybe a different averaging solution.
I got close with kmeans I think but ran out of time. Here is code with a TODO for next time.
kmeans seems overly complicated at this point, I am going to turn each of those blobs into a countour and find the center. I tried an "open" operation but it seems to reduce the framerate substantially:
cv.morphologyEx( tempMat, newMat, cv.MORPH_OPEN, Ma, anchor, 1, cv.BORDER_CONSTANT, cv.morphologyDefaultBorderValue() );
Instead, I am just going to create large circles at the intersection points so it looks like a big connected blob
OK, here it is altogether:
There is a fair amount of jitter because the lines are unstable. I think a lot of it is coming from the canny edge detection that happens first, look at all this noise:
It is coming from noise in the initial color filter.
Hmm, "convex hull" may be what I want to get something more stable. It isn't really any better. I am trying to find a way to get the "skeleton" of the pixels displayed so there is only a single line but I am not having good luck.
Ooo, there is a fitLine function...cv.fitLine(cnt, cv.DIST_L2,0,0.01,0.01) But it wouls still require identifying regions of sides.
approxPolyDP may be what we need:
approxPolyDB might give us an access to straight lines without noise:
Final code before switching to a four marker solution
Discussed with @BLFiedler at a check-in meeting. We like the idea of line tracking but lets put that on hold for now.
EDIT: I would like to first play with marker size to accomplish this because it would be easiest. Keeping in mind I think we can pretty quickly change things to support just about anything listed here. The idea is that we could have markers of varying length. Then the height of each marker could still be used to determine perspective if we want.
Notes as I work on a solution that uses 4 discrete markers. Overall, there is hardly any noise and it feels really fast. But it is of course more susceptible to marker occlusion.
I made substantial progress on this today, here is my hacky code:
Demonstration of the behavior, with detected positions controlling the sim:
Don't have labelled vertices or something resilient to perspective figured out yet but I think that seems relatively straight forward to work on next.
Discussed status with @BLFiedler, over slack he mentioned two things that would be good to have next 1) A shareable version so the team can try it and determine what should be worked on next. 2) A way to flip the camera feed so that it will work if the camera is over the shoulder instead of pointing toward user's face.
EDIT: For next time, cv has built in functions to flip an image vertically: cv.flip(image, image, 0); horizontally: cv.flip(image, image, +1);
I think we'll need to support both horizontal and vertical flip? I made a quick video about possible detection window orientations with respect to the marker locations
These are really interesting perspectives on the device. I have questions about how to consistently start the description.
When a shareable version is ready, let's keep a version with the non-identified vertices (relabels when the shape rotates)
Next step will be adding vertex identification to enable rotation of the quad while keeping the same vertex from startup
Played around with small squares affixed to the TMQ as well as free moving green blocks. I mounted my webcam on the ceiling above me (sloped ceiling). https://phet-dev.colorado.edu/html/jg-tests/opencv-test/
Setup notes:
1.) SUPER FUN. Amazing how much we can already do with it once it's set up. Quite smooth (with the exception of the below notes) when the parameters are dialed in.
2.) A few videos I took playing around with the current setup.
Video 1:
Video 2:
@BLFiedler Sounds very cool! I can't seem to get the videos to load... is anyone else having this problem?
I tried changing the formatting of the post above which made the embedding show up. Let me know if that fixes it. Otherwise, I've put the videos here, though they may take a bit to process: https://drive.google.com/drive/folders/1zwKRagycbptEeRXa3AhiuEQ0CeUCVsvh?usp=sharing
The corner demo is so cool @BLFiedler!
Do you know what is happening in the TMQ demo? It jitters without movement?
Repost of my above comments with some additional details taken while chatting with @jessegreenberg . Includes plans for new issues to prioritize after Voicing:
[x] [NEW ISSUE] Detection jitter around all-side-equal and all-angle-equal is a bit frenzied as it enters and leaves the tolerance interval.
[x] [EXISTING ISSUE #116] A small smoothing algorithm of a limited number of prior data points will help, but will also need to implement #116 , likely for all tolerance intervals?
[x] [NEW ISSUE] Lack of marker-to-vertex identity adds some funny behavior when rotating.
Will need to make sure the case of a concave shape or swapping vertices is handled correctly (and not mistaken for each other).
[ ] [Lower priority NEW ISSUE] When markers are close to each other, they merge (when red boxes touch) - want to avoid this behavior if possible
[ ] [NEW ISSUE] How to elegantly handle loss of a marker or bad data
[ ] [NEW ISSUE] Might be nice to add a "Reset to Default" for the HSV filter values. I found myself just refreshing the page.
[ ] [NEW ISSUE] Testing setup: OpenCV just in the environment, not in the sim at all. What do we want with regards to the controls and video feed embedded directly into the simulation (pref menu?) - This will impact what we do for RaP as well.
Cool videos! I think this shows lots of potential, particularly with the four blocks...
Updating needs for OpenCV issues from
- We might be able to autodetect the green based on HSV (and auto-set the ranges for each value) with a manual override possibility or let a user pick the color to help with user setup.
This was done as part of JGs tests - currently being further developed in: https://github.com/phetsims/quadrilateral/issues/141
- [Lower priority NEW ISSUE] When markers are close to each other, they merge (when red boxes touch) - want to avoid this behavior if possible
This shouldn't be an issue when using 4 distinctly colored markers: https://github.com/phetsims/quadrilateral/issues/141
- [NEW ISSUE] How to elegantly handle loss of a marker or bad data
Also to be worked on in https://github.com/phetsims/quadrilateral/issues/141 as part of marker differentiation.
- [NEW ISSUE] Might be nice to add a "Reset to Default" for the HSV filter values. I found myself just refreshing the page.
Creating new issue that retains last used values from browser cache.
- [NEW ISSUE] Testing setup: OpenCV just in the environment, not in the sim at all. What do we want with regards to the controls and video feed embedded directly into the simulation (pref menu?) - This will impact what we do for RaP as well.
On hold for now - current interface usable with PhET-iO and the sim can be full screen to hide the interface. Hiding this in a menu will make setup difficult.
For current needs, this is complete.
Snippets taken/edited from Slack convo between BF & JG
BF: There's interest in at least exploring what it might take to implement the fiducial markers as a means of providing device orientation while CHROME lab does the same with the embedded sensors (gyroscopes/accelerometers).
BF: I think of greatest importance is considering the lift we'd be asking to pull in data from multiple sources and what would need to be communicated across the three: microcontroller/sim/marker detection.
Related repos and issues: https://github.com/phetsims/tangible Investigate other marker input strategies: https://github.com/phetsims/tangible/issues/7 Performance implications of user testing with mechamarkers: https://github.com/phetsims/ratio-and-proportion/issues/28 Will tangible input be included in the published simulation?: https://github.com/phetsims/ratio-and-proportion/issues/89
JG: My biggest question right off the bat is if you have any thoughts about https://github.com/phetsims/tangible/issues/7? It sounds like the marker strategy that was tried in RaP may not have been performant enough. Should we start with what was in RaP or just try something else?
BF: Yeah, so it was not quite enough to allow smooth movement when moving multiple objects. I'll see if I can get the demo up and running for Monday. It's possible with just one and no need for very rapid detection, it may not be so bad
BF: If we come up with enough questions, we can reach out to Peter from Ellen's lab to check on the current state of things and see what we can pull in re: rotation and tilt. They also had different sets of markers that seemed to possibly perform differently? And he had not updated in Spring 2021, but things might be different now?
JG: Just pondering internally... If we have a computer vision solution, will we still need the microcontroller?
BF: Yeah, I think this is only intended to partially solve the global orientation problem. I am absolutely positive using multiple markers will be nothing but trouble regarding constant detection/occlusion and resulting hitches in model updates for the existing Quad device from CHROME (based on the form factor and how a user moves their hands around the device). But, if we are only relying on ~1 to tell us if something is positioned upright or not, it may not matter too much. Of note, if we introduce any markers at all we will have to accept there will be some moments of desync between the visual display and what the learner is doing whenever detection is lost (be it occlusion, tilt, or goblins).
We should consider the implications of both scenarios:
Tagging @zepumph, since he has worked most extensively, in case there are any nuggets of wisdom to share after talking to @jessegreenberg .
@BLFiedler & @jessegreenberg meeting on 11/1 to discuss.