Open retazo0018 opened 1 month ago
Hi,
Thanks for reaching out. Honestly the hardware spec. does not matter as long as you have RGBD camera 😂 because the hardware experiment is just a pick and place task. I will explain a little more.
First, we have an empty space with clean background, and use SAM to automatically extract the mask (by clicking certain fixed locations). Then, extract prototypes based on the given mask, and run detection on a new scene. All objects of that class will be picked sequentially. The pick and place procedure only requires a proper grasp pose, which you can generate from the object point cloud (cropped by the instance segmentation mask) with GPG https://github.com/atenpas/gpg or more advanced tools. The grasp pose does not depend on the robot arm. The prototype extraction and detection part is the same as in the YCB demo.
Unfortunately I don’t have plans to organize and release that part of the code though.
Best, Xinyu
On Oct 9, 2024 at 8:57:55 AM, Ashwin Murali @.***> wrote:
HI @mlzxy https://github.com/mlzxy ,
Could you please mention the hardware spec. of the system used for the "Real Robot Experiment" in the paper? Since that produced instant results, it would be helpful to know and reciprocate the same in our use-case too.
Many Thanks,
Best,
— Reply to this email directly, view it on GitHub https://github.com/mlzxy/devit/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OBPXHYUOQBTSMRKTRMQLZ2UR5HAVCNFSM6AAAAABPUPL3EKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU3TKOBWGIZTSNA . You are receiving this because you were mentioned.Message ID: @.***>
Thanks @mlzxy for your reply,
I tried the vit-l open vocabulary model on my custom dataset and it produced good results with around 0.8FPS in Jetson Orin device.
I'm looking for 5-10 FPS result for my application. Is it possible to obtain this rate of output? If not, could you point me in a right direction and I will experiment a bit in my time.
Thanks,
Best,
To improve speed, I suggest these things:
Best, Xinyu
On Oct 21, 2024 at 6:24:04 AM, Ashwin Murali @.***> wrote:
Thanks @mlzxy https://github.com/mlzxy for your reply,
I tried the vit-l open vocabulary model on my custom dataset and it produced good results with around 0.8FPS in Jetson Orin device.
I'm looking for 5-10 FPS result for my application. Is it possible to obtain this rate of output? If not, could you point me in a right direction and I will experiment a bit in my time.
Thanks,
Best,
— Reply to this email directly, view it on GitHub https://github.com/mlzxy/devit/issues/62#issuecomment-2426258870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OBPXFSXFHH2H7K6AQKCTZ4TI4JAVCNFSM6AAAAABPUPL3EKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRWGI2TQOBXGA . You are receiving this because you were mentioned.Message ID: @.***>
HI @mlzxy ,
Could you please mention the hardware spec. of the system used for the "Real Robot Experiment" in the paper? Since that produced instant results, it would be helpful to know and reciprocate the same in our use-case too.
Many Thanks,
Best,