wenbowen123 / iros20-6d-pose-tracking

[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
Other
384 stars 66 forks source link

Performance on textureless objects #58

Closed guzhouyi closed 1 year ago

guzhouyi commented 1 year ago

Hi Dr. Wen, thanks for your work. I trained the network for texture-less object tracking but the performance is quite poor. The rotation estimation result is wired even if the rotation loss in training was very small.

For synthetic data generation, texture-less 3d model was loaded and only one image was used as background. The network war trained 300 epochs on 5k pairs of training images and 1k7 pairs of validation images. All config parameters were unchanged except camera position (0.0, 0.0, 0.2), ob.active_material.texture_slots[0].scale = 1.6 in blender, camera resolution 1280*720 in dataset_info.yml and loss_weights with trans: 1 and rot: 5 in config.yml.

I wonder if the network is able to track texture-less objects like DeepIM. If so, I would appreciate if you could give me some advice

wenbowen123 commented 1 year ago

Hi, the textureless objects haven't been observed as an issue. There are a number of objects in YCB-Video that are textureless. Besides, we also successfully did high precision insertion tasks with textureless objects (see video)

I'd suggest keep the same config setting as ours but only ensure your camera intrinsics are the same as yours. Also, the 5k training pairs maybe too few, as we were using 200k images.

guzhouyi commented 1 year ago

Thanks for your reply. It took me almost a week to generate 30k images and I only got 5k image pairs for training because in most images there is no visible pixels of the object. For generating 200k image I think it would take a month. Is that normal? And I wonder if larger rotation loss_weights would be helpful to force the network to get better rotation result?

wenbowen123 commented 1 year ago

Previously we were using multiple computers (on the server) to generate and then merge the data together.

I think the main issue maybe still the data size is too small to let the network see many different rotations to learn. I didn't observe the weight as an issue.