amalnanavati commented 6 months ago

Description

In service of #43 , this PR implements automatic food-on-fork detection.

The core approach is to store a representative set of 3D points that correspond to the fork shape when there is no food on it. Then, given a new depth image, compute the average distance between the points in that image and the nearest point on the stored image. When there is no food-on-fork, that distances are on the order of 1e-4 m, whereas when there is food on the fork, they are on the order of 1e-3 m. Train a classifier on those distances to predict the probability that there is food on the fork.

As a concrete example, the specific model pushed in this PR stores 2426 points which look like the below image: Screenshot 2024-03-08 at 6 59 12 PM

This approach achieves ~96% accuracy on bagged data of 8 real bites (acquisition and transfer) when we use an upper and lower threshold of 0.5. Inference takes a median of 0.016s (25th percentile 0.007s, 75th percentile 0.022s).

Testing procedure

[x] Offline data:
- [x] Download all the data from here. Place it in ~/colcon_ws/src/ada_feeding/ada_feeding_perception/data/food_on_fork/.
- [x] Download this branch, re-build and source your workspace.
- [x] Go into the directory of the test script: cd ~/colcon_ws/src/ada_feeding/ada_feeding_perception/ada_feeding_perception
- [x] Run the test script (may take several minutes): python3 food_on_fork_train_test.py --model-classes '{"distance_no_fof_detector_with_filters": "ada_feeding_perception.food_on_fork_detectors.FoodOnForkDistanceToNoFOFDetector"}' --model-kwargs '{"distance_no_fof_detector_with_filters": {"camera_matrix": [614.5933227539062, 0.0, 312.1358947753906, 0.0, 614.6914672851562, 223.70831298828125, 0.0, 0.0, 1.0], "min_distance": 0.001, "verbose": true}}' --lower-thresh 0.5 --upper-thresh 0.5 --train-set-size 0.5 --crop-top-left 308 248 --crop-bottom-right 436 332 --depth-min-mm 310 --depth-max-mm 340 --rosbags-select 2024_03_01_two_bites 2024_03_01_two_bites_2 2024_03_01_two_bites_3 2024_02_29_two_bites --seed 42 --temporal-window-size 5 --spatial-num-pixels 10 --no-train
- [x] Verify that the accuracy is ~96%.
[x] Played back ROS2 bag:
- [x] Launch perception code: ros2 launch ada_feeding_perception ada_feeding_perception.launch.py
- [x] Toggle food-on-fork detection on: ros2 service call /toggle_food_on_fork_detection std_srvs/srv/SetBool "{data: true}"
- [x] Echo food-on-fork detection: ros2 topic echo /food_on_fork_detection
- [x] Launch RVIZ: ros2 run rviz2 rviz2 --ros-args -p use_sim_time:=true. Add an image for /local/camera/aligned_depth_to_color/image_raw and /food_on_fork_detection_img
- [x] Go to the directory where you store the ROS2 bags and run one of them: ros2 bag play 2024_03_01_two_bites --clock. Note that this particular rosbag doesn't start streaming data until 16 secs in.
- [x] Monitor the RVIZ color image and the ROS2 topic echo and verify it makes sense. (NOTE: for visualization purposes, the node has a lower threshold of 0.25 and an upper threshold of 0.75, so will ask the user if the probability is between those.)
[x] Real Robot:
- [x] Launch the robot code as usual.
- [x] As documented above, toggle food-on-fork detection on, echo the output, and launch RVIZ with the visual output.
- [x] Move the robot through the motions of multiple bites, adversarially manipulate the fork (e.g., moving the plate or your hand near the forktip), verify the results make sense.
- [x] Rotate the fork handle in the gripper (yaw) until it predicts FoF even when there is none. In RVIZ, visualize the /food_on_fork_detection_img. Verify that that view helps you re-align the fork.

Documented Commands for Training

Training: python3 food_on_fork_train_test.py --model-classes '{"distance_no_fof_detector_with_filters": "ada_feeding_perception.food_on_fork_detectors.FoodOnForkDistanceToNoFOFDetector"}' --model-kwargs '{"distance_no_fof_detector_with_filters": {"camera_matrix": [614.5933227539062, 0.0, 312.1358947753906, 0.0, 614.6914672851562, 223.70831298828125, 0.0, 0.0, 1.0], "min_distance": 0.001}}' --lower-thresh 0.25 --upper-thresh 0.75 --train-set-size 0.5 --crop-top-left 344 272 --crop-bottom-right 408 336 --depth-min-mm 310 --depth-max-mm 340 --rosbags-select 2024_03_01_no_fof 2024_03_01_no_fof_1 2024_03_01_no_fof_2 2024_03_01_no_fof_3 2024_03_01_no_fof_4 2024_03_01_fof_cantaloupe_1 2024_03_01_fof_cantaloupe_2 2024_03_01_fof_cantaloupe_3 2024_03_01_fof_strawberry_1 2024_03_01_fof_strawberry_2 2024_03_01_fof_strawberry_3 2024_02_29_no_fof 2024_02_29_fof_cantaloupe 2024_02_29_fof_strawberry --seed 42 --temporal-window-size 5 --spatial-num-pixels 10
- Note: Add "aggregator_name": null to the kwargs and --viz-fit-save-dir "../model/results" to the command line arguments to graph out the decision boundary for different aggregator functions.

Ideas for Future Improvements

Add a filter to the FoodOnForkDetection node for more stability.
Instead of storing points, consider using urdfpy to get the actual fork mesh and get the closest point to that. (Although we know that at least as of now, our fork URDF is slightly off from the real fork, both because of some movement of the fork handle (#170 ) and because of the fork tines bending.)

Before opening a pull request

[x] Format your code using black formatter python3 -m black .
[x] Run your code through pylint and address all warnings/errors. The only warnings that are acceptable to not address is TODOs that should be addressed in a future PR. From the top-level ada_feeding directory, run: pylint --recursive=y --rcfile=.pylintrc ..

Before Merging

[ ] Squash & Merge

amalnanavati commented 6 months ago

TODOs:

[x] Convert the pointcloud from the camera frame to the forktip frame, so this approach is more robust to movement of the camera.
[x] Investigate the stability of this approach: how much does the decision boundary vary depending on the random subset? Investigate alternate classifier models, such as SVM which will likely learn a better decision boundary.

amalnanavati commented 6 months ago

Mostly done. When playing back the above ROSbag, the predictions were quite unstable in the in-front-of-face configuration. I tried retraining with mean, but it didn't fix that. Worth looking into this more. But first test it in-person / play back the other ROSbags to not overfit to one case. Also because as of now, we won't be checking food-on-fork in the in-front-of-face configuration.

amalnanavati commented 5 months ago

Did the in-person tests. It works...except that it is very sensitive to the fork being slightly rotated in the gripper. The thing is, this is a problem we have had before that causes the fork to not skewer items centered. So while there are some FoF changes I can do to make it less sensitive (e.g., only focus on deviation in the z direction, or maybe weight deviations in each direction differently), I think we should first try to come up with a proper fix to ensuring the forktip is actually aligned with the frame.

personalrobotics / ada_feeding

Food-on-Fork Detection #169

Description

Testing procedure

Documented Commands for Training

Ideas for Future Improvements

Before opening a pull request

Before Merging