wenbowen123 / iros20-6d-pose-tracking

[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
Other
388 stars 67 forks source link

About synthetic data #15

Closed johnbhlm closed 3 years ago

johnbhlm commented 3 years ago

First of all thank you very much for sharing. I saw that you shared the synthetic data set used for training. I have some doubts and need your help:

  1. Is the synthetic data generated in blender using the parameters you gave in the paper?
  2. Mentioned in the paper: "For both training and inference, rendering of It1 is implemented in C++ OpenGL.” But in the code, only OpenGL is used for rendering in preeditct.py(call in vispy_renderer.py). The input in training is A (the synthesized image has been generated), so A in training and It-1 in inference are Is it generated by the same tool?

Looking forward to your answer, thank you!

wenbowen123 commented 3 years ago
  1. Yes.
  2. In all training data, OpenGL is used for generating rgbA.png (I_t-1). Blender is used for generating rgbB.png (I_t). During test, OpenGL is still used for generating I_t-1, while rgbB is now replaced by real data from camera.
johnbhlm commented 3 years ago

Thank you for your answer. Using openGL to generate A.png is implemented in the vispy_render.py file. I know that during inference, the value of A_in_cam is initialized using groundtruth, but when generating the A.png training data set, how do you set A_in_cam Value? This is to ensure that A_in_cam and B_in_cam are paired, just like the meta.npz file.

I want to generate my own data set, the data format is consistent with yours, and I look forward to your help. Thank you

wenbowen123 commented 3 years ago

The original code is using OpenGL in C++. Here for easier portability, we provide two implementations in Python: vispy or pyrender (in YCBInEOAT). You can use any of them.

A_in_cam is randomly sampled (randomly sample a rotation and translation). This should be very arbitrary as long as the object is visible in the image. Then a relative transform is randomly generated as introduced in the paper (page4 bottom left), and augmented on A_in_cam to get B_in_cam.

johnbhlm commented 3 years ago

Okay, I see. Thank you again for your answers. As a student who has just been involved in this field, I hope to get more help from you, and I hope you will not dislike my interruption.

A_incam is randomly sampled as described in the paper "I^{t-1}{syn,train} is obtained by randomly sampling a perturbated pose T^t_{t-1} where its translation’s direction is uniformly sampled and its norm follows a Gaussian distribution.....", but you said "augmented on A_in_cam to get B_in_cam", so I have some confusion:

  1. Isn't the B_in_cam the groundtruth of the object? Isn't it already generated during *B using blender.
  2. 2.How to get B_in_cam by augmented on A_in_cam?

In other words, I have used blender to generate B data, including depth, mask, rgb, and object pose. I hope to generate a training set (pairs of A and B) in the same format as your data. What do I need to do next?

If possible, can you share your code? I will be very grateful to you.

wenbowen123 commented 3 years ago
  1. Yes, B_in_cam is the groundtruth of the object of image B. but when generating synthetic train data, we want to generate a pair of two different images A and B. So we need to make A_in_cam and B_in_cam different.
  2. Following the above, to make A_in_cam and B_in_cam different, the way I did is to first randomly generate B_in_cam (just realize previously I flipped A and B, here should be correct). You are doing the right way. Then randomly sample a perturbated relative pose delta_T to transform on B_in_cam which gets A_in_cam.

I'm planning to add the code of that part some time when I get a chance to look into my old code. Hopefully soon.

johnbhlm commented 3 years ago

Thank you very much for your answer. I understand, I am looking forward to you sharing the code of synthetic data.

wenbowen123 commented 3 years ago

Added, see this. It's unpolished code, so it's not guaranteed to work, but you can get some ideas about how to do with your own data.

johnbhlm commented 3 years ago

thanks for sharing. There are several problems in the produce_train_pair_data.py file (the corresponding function is not referenced),

  1. In line 45, self.renderer = ModelRendererOffscreen([obj_path],self.cam_K,dataset_info['camera']['height'],dataset_info['camera']['width']), ModelRendererOffscreen() method Is it the rendering method of offscreen_renderer.py? If so, can I also use the rendering method in vispy_renderer.py?

  2. In the generate() function, line 88: bb_ortho = compute_2Dboundingbox(A_in_cam, self.cam_K, self.object_width, scale=(1000, -1000, 1000)) and line 92: bb = compute_2Dboundingbox(A_in_cam, self. cam_K, self.object_width, scale=(1000, 1000, 1000)), the corresponding compute_2Dboundingbox() function was not found. Is it the same function as the compute_bbox() function (line 302) in Utils.py?

  3. In lines 100, 104, and 106, the normalize_scale()function is quoted, but I don't know which function in the file you quoted?

Additional questions: In each dataset_info.yml file: 1.boundingbox: 10,What does 10 mean? 2.object_width: Does it mean the width of the object?

  1. What is the relationship between resolution, camera:height and camera:width?
  2. What is the value of max_trans = self.dataset_info['max_translation'] and max_rot = self.dataset_info['max_rotation']?

Because I don't have a file like dataset_info.yml, I need to construct one by myself.

Thanks again, I look forward to your answers.

jingweirobot commented 3 years ago

@wenbowen123 I have the same question about the function ModelRendererOffscreen, compute_2Dboundingbox with @johnbhlm. So could you provide some information? We are trying to re-implement this paper. Thanks in advance.

johnbhlm commented 3 years ago

@wenbowen123 I have the same question about the function ModelRendererOffscreen, compute_2Dboundingbox with @johnbhlm. So could you provide some information? We are trying to re-implement this paper. Thanks in advance.

@jingweirobot I find that: compute_2Dboundingbox() may be compute_bbox() in Utils.py file; normalize_scale () may be crop_bbox() in Utils.py file;

but I do not know what the values dataset_info['max_translation'] and dataset_info['max_translation'] in dataset_info.yml file,do you know their values?

wenbowen123 commented 3 years ago

Sorry for the late reply as I'm relatively busy recently.

@jingweirobot @johnbhlm @zhuyazhi I have updated the code. For now I temporarily dont have the synthetic data to test on my hand for this part of code. Let me know if there is any other issue.

In the latest produce_train_pair_data.py, I'm assuming you will be using pyrenderer. So in your own dataset_info.yml, you need to add one line, and make sure you will be using pyrenderer in your test stage.

renderer: pyrenderer

Your understanding is correct. For your questions

  1. max_translation and max_rotation are set emperically. It means the max possible movement between neighboring frames. For YCB-Video, I remember I set to something like 15 degree and 0.02 m.
  2. object_width is the width of the object
  3. boundingbox: 10 means the cropped window should be (100+10)% times of the tightly enclosed 2D bounding box of the object.
johnbhlm commented 3 years ago

@wenbowen123 Thank you for your reply, I see. In the upper right corner of the fifth page of the paper,I found your parameter settings,But I still don’t know how to set the values of max_translation and max_rotation.

Can you show your YCB and YCBInEOAT data_info.yml files separately? I mean the data_info.yml file used when synthesizing data. I want to use them as a template to build my own data_info.yml. In other words, I want to refer to them to build my own data_infor.yml.

Thanks in advance.

zhuyazhi commented 3 years ago
  1. rgb_files = sorted(glob.glob('/media/bowen/56c8da60-8656-47c3-b940-e928a3d4ec3b/blender_syn_sequence/mydataset_DR/*rgb.png'.format(class_id)))

  2. meta = np.load(rgb_file.replace('rgb.png','poses_in_world.npz')) class_ids = meta['class_ids'] poses_in_world = meta['poses_in_world'] blendercam_in_world = meta['blendercam_in_world']

should we prepare rgb_files before runing produce_train_pair_data.py? anyone could tell me more detail about " rgb_files" and "meta" thank you~

johnbhlm commented 3 years ago

@zhuyazhi The purpose of using produce_train_pair_data.py is to generate A according to B, so B should be prepared first. My B is generated using blender.

Is your data_info.yml file ready? How did you set the max_translation and max_rotation values? Would you like to share it?

zhuyazhi commented 3 years ago

@johnbhlm I also have a set of syn_data generated by blender, but what about the meta? I only have rgb,depth,pose (0.jpg,0_depth.png,0_RT.pkl........), so how do you obtain meta['poses_in_world'] and meta['blendercam_in_world']?

meta = np.load(rgb_file.replace('rgb.png','poses_in_world.npz')) class_ids = meta['class_ids'] poses_in_world = meta['poses_in_world'] blendercam_in_world = meta['blendercam_in_world']

  1. I only change model_path and object_with now, according to my model.
johnbhlm commented 3 years ago

@zhuyazhi Yes, I extracted the pose of the current category from the pkl file as B_in_cam,I think this should be correct. @wenbowen123

Isn't object_with calculated in the code? How do you get the value of object_with?

johnbhlm commented 3 years ago

It doesn't matter, you just change to your own corresponding class_id.

wenbowen123 commented 3 years ago

@wenbowen123 Thank you for your reply, I see. In the upper right corner of the fifth page of the paper,I found your parameter settings,But I still don’t know how to set the values of max_translation and max_rotation.

Can you show your YCB and YCBInEOAT data_info.yml files separately? I mean the data_info.yml file used when synthesizing data. I want to use them as a template to build my own data_info.yml. In other words, I want to refer to them to build my own data_infor.yml.

Thanks in advance.

A template should be like this. object width can be left out, the program will compute it automatically.

boundingbox: 10
camera:
  centerX: 312.9869
  centerY: 241.3109
  focalX: 1066.778
  focalY: 1067.487
  height: 480
  width: 640
distribution: gauss
models:
- 0: null
  model_path: /media/bowen/e25c9489-2f57-42dd-b076-021c59369fec/DATASET/YCB_Video_Dataset/CADmodels
/011_banana/textured.ply
  obj_path:  /media/bowen/e25c9489-2f57-42dd-b076-021c59369fec/DATASET/YCB_Video_Dataset/CADmodels
/011_banana/textured.obj

resolution: 176
train_samples: 200000
val_samples: 2000
renderer: pyrenderer
max_translation: 0.02                # Meters
max_rotation: 15                      #degree
jingweirobot commented 3 years ago

Hi Wenbo @wenbowen123 I have the same questions with @zhuyazhi

rgb_files = sorted(glob.glob('/media/bowen/56c8da60-8656-47c3-b940-e928a3d4ec3b/blender_syn_sequence/mydataset_DR/*rgb.png'.format(class_id)))

In terms of rgb_files, how could we prepare these files? where is the code in this repo?
how about the following lines?

meta = np.load(rgb_file.replace('rgb.png','poses_in_world.npz'))
class_ids = meta['class_ids']
poses_in_world = meta['poses_in_world']
blendercam_in_world = meta['blendercam_in_world']

should we prepare rgb_files before runing produce_train_pair_data.py? please kindly tell me more detail about " rgb_files" and "meta"

thanks in advance.

wenbowen123 commented 3 years ago

@jingweirobot

rgb_files are the so called 'rgbB' files discussed above with @johnbhlm meta saves the information of class_ids poses_in_world blendercam_in_world in the blender setup.

These are common 640x480 rendered images and data generated in Blender. You can refer to projects such as https://github.com/DLR-RM/BlenderProc about how to generate synthetic data of image Bs. Then my code will generate its pairs A.

yangshunDragon commented 2 years ago

BlenderProc about how to generate synthetic data of image Bs

Hello, bowen. Thanks for sharing. Would you please share the code which you use BlenderProc to generate synthetic data? Thanks in advance!

wenbowen123 commented 2 years ago

@yangshunDragon @jingweirobot @johnbhlm @zhuyazhi Training data generation steps have been provided in readme

ZhenruiJI commented 2 years ago

@yangshunDragon @jingweirobot @johnbhlm @zhuyazhi Training data generation steps have been provided in readme

Hi bowen, thanks for sharing your nice work. I have run blender_main.py and found that the speed of generator is so slow (about 8 sec per sample). Is there any trick to accelerate the generator's speed? Look forward to your reply!