Speeds it up: As documented in #134 , due to GPU throttling, SegmentAnything runs unreasonably slow on wheelchair power. This PR improves that by implementing EfficientSAM, which has a ~3x speedup relative to SAM.
Returns feedback: When nothing changes on the web app, it is hard for users to know if something has crashed or is working but just thinking. Thus, this PR returns feedback from SegmentFromPoint (elapsed time), which is already supported in the web app.
With this combination, I feel bite selection is usable on wheelchair power (although of course further speedups would be welcome).
Note that I also tried FastSAM on the images in ada_feeding_perception/text/food_img, but got very inaccurate results. This might be because FastSAM segments the entire image, and then filters through returned masks using the seed point; it doesn't account for the seed point in segmentation. I also tried FastSAM's text prompting with prompts like "the bite of watermelon," "the apple," etc. on the test images and also got inaccurate results.
[x] With the web app, do bite selection on several different objects. Document the time here:
EfficientSAM First Segmentation: 2.2s
EfficientSAM Other Segmentations: 1s
[x] Change the use_efficient_sam parameter to false and repeat the above:
SAM First Segmentation: 4s
SAM Other Segmentations: 3s
[x] Verify that the app renders feedback
Before opening a pull request
[x] Format your code using black formatterpython3 -m black .
[x] Run your code through pylint and address all warnings/errors. The only warnings that are acceptable to not address is TODOs that should be addressed in a future PR. From the top-level ada_feeding directory, run: pylint --recursive=y --rcfile=.pylintrc ..
Description
This PR makes two changes to SegmentFromPoint:
With this combination, I feel bite selection is usable on wheelchair power (although of course further speedups would be welcome).
Note that I also tried FastSAM on the images in
ada_feeding_perception/text/food_img
, but got very inaccurate results. This might be because FastSAM segments the entire image, and then filters through returned masks using the seed point; it doesn't account for the seed point in segmentation. I also tried FastSAM's text prompting with prompts like "the bite of watermelon," "the apple," etc. on the test images and also got inaccurate results.Testing procedure
On the real robot:
use_efficient_sam
parameter to false and repeat the above:Before opening a pull request
python3 -m black .
ada_feeding
directory, run:pylint --recursive=y --rcfile=.pylintrc .
.Before Merging
Squash & Merge