Open mateoKutnjak opened 4 years ago
Hi, interesting question. Maybe the 0.0001 learning rate is a little bit too big for the refinement model. Could you take a try whether smaller lr can avoid this rotation problem?
I am running training right now and will post result as soon as it finishes.
EDIT 1: I changed amount of x rotation of object in dataset from [-15, 15] degrees to [-5, 5] degrees and evaluation seems enormously better. In my dataset Z axis has origin in center of object and goes towards camera, Y axis has origin in center of object and goes up, and X axis has origin in center of object and goes to the left when looking from camera frame.
Hi, sorry I didn't get it. So you are talking about your own dataset, not the YCB or linemod? What do you mean by changing the amount of rotation? You mean you are changing the sampling range of the rotation of the data?
I am performing training on my own dataset. It seems that every iteration of refinement rotates 3D point representation of an object for 180 degrees around SOME axis. It happened before around X axis, now it it is being flipped around Z axis. My only guess is that learning rate. is too big and it switches its orientation back and forth.
Error was in first epoch. Network gets stuck in local minimum where rotations around one of the axis is wrong by 180 degrees. I tried to run network several times until after 1000 iterations angle differences were max 50 degrees. After that refinement process is faster and correct.
refine_margin=0.013, lr=0.0001
@mateoKutnjak I have the same issue can you please tell me how you fixed it?
@mateoKutnjak I have the same issue can you please tell me how you fixed it?
Hi. I haven't found source of the problem. :/ I stopped the network when this situation happened. You can probably programmatically detect this behavior and stop training.
I switch to the other project shortly after so maybe som grid search of better parameter would be best (if you have resources to do that).
Also I could not afford lengthy training so I left small value of learning rate. I suggest lowering learning rate
Every second iteration of refinement network I get jumps in average distance metric. It seems like objects are rotated 180 degrees every iteration:
Pose model distance: 0.00859 Refinement model distance, iteration 0: 0.0443 Refinement model distance, iteration 1: 0.0034 Refinement model distance, iteration 2: 0.0444
Pose model distance: 0.03744 Refinement model distance, iteration 0: 0.0516 Refinement model distance, iteration 1: 0.0345 Refinement model distance, iteration 2: 0.050
I am training DenseFusion with two objects and refine margin is set to 0.006, but I keep lowering it because Refinement have this rotation behavior. Is learning rate from the github for refinement too large (0.0001 with lr_rate decay of 0.3)?