incognite-lab / myGym

myGym enables fast prototyping of RL in the area of robotic manipulation and navigation.You can train different robots, in several environments on various tasks. There is automatic evaluation and benchmark tool. From version 2.1 there is support for multi-step tasks, multi-reward training and multi-network architectures.
https://mygym.readthedocs.io/en/latest/
MIT License
49 stars 11 forks source link

Question about the vision module #40

Open zichunxx opened 10 months ago

zichunxx commented 10 months ago

Hi! Dear developer team!

I'm very interested in the repo after reading the introduction. However, I'm confused about whether I can achieve my goal with this repo and would like to get an answer.

I want to construct a pick-and-place task, in which the vision module can recognize the pose (position and rotation) information of some objects. For now, I just want to realize this with only one object on the table.

I saw the vision module is integrated into this repo. But if the vision module can only finish the object segmentation?

I'm new to this area, so please forgive my ignorance.

Thanks in advance!

michalvavrecka commented 10 months ago

Hi Stay,

you are at the right place. Our simulator is capable of pick and place tasks. The implementation of 6DOF vision (position and rotation) is WIP, we do have sematic segmentation and unsupervised VAE. You can extend both (multiple camera segments or latent space intrerpolation) to the 3D vision. If you would like to train the PnP task with 6DOF vision, you will face the time problems, as the most advanced 3D bounding box vision algorithms are capable to process about 2 frames per second. It means that your training will take 500000 seconds per 1milion steps = 6 days. I will reccomend you to train without vision using objects recognizable by 6DOF vision and test it with data from camera (the speed during testing is not cruicial)