ybkscht / EfficientPose

Other
247 stars 68 forks source link

Question about own dataset #8

Open datta-TG opened 3 years ago

datta-TG commented 3 years ago

Hello, congratulations for your work, it's amazing!

I want to detect the orientation of chairs, I tried some weights but the results were not good. Is it possible to do the detection or should I train the model with my images? image

How many images are necessary for a good training? The images must have the same order and files as linemod_preprocessed? image

Thanks in advance!

maurocaio commented 3 years ago

Hello, I too am interested in training your model on a dataset other than LineMod. Each image in the dataset is annotated with: 1) quaternion 2) translation 3) bbox

Could you tell me if it is possible to train your model with my dataset?

PS. I don't have the 3d model of the object

Thanks your work is fantastic!!!

ybkscht commented 3 years ago

Hi @datta-TG,

thanks for your nice words!

You need to train EfficientPose on your own dataset to be able to estimate the poses of the chairs. I can't really give you an exact number of images you need because this depends on a lot of things, e.g. the number of chairs you have, is it always in the same room or not, the min and max distance of the chairs, lighting conditions, ... The more general your use case is, the more data you need.

To train on your own dataset, there are two possibilities:

Sincerely, Yannick

ybkscht commented 3 years ago

Hi @maurocaio,

thank you very much!

That you don't have the 3D model makes it more difficult because EfficientPose utilizes the object's 3D model in the transformation loss. You could try to use the 3D bounding box of the object (just the eight corner points) instead of the full 3D model. I think this also should work. But keep in mind that you possibly can get problems with symmetric objects using this method due to ambiguities (multiple poses of the object can have the same appearance). In this case the network will be penalized unnecessarily during training which could worsen the performance. Our method tackles these symmetry problems using the more detailed shape information of the object 3D models (described in section 3.4) which is not given using only the 3D cuboids.

Except for this, you should be fine. You just need to convert the quaternion into axis angle representation as this is currently the only supported rotation representation.

Sincerely, Yannick

geoffhorowitz commented 3 years ago

Hi Yannick,

Thanks for the work and comments above. I have a custom dataset that is in the linemod format (works for other models running linemod data directly). My understanding is that I should be able to run the custom data through generators/linemod.py to get it into the linemod_preprocessed format needed for your model, correct?

I'm trying to run through this with the original linemod data to make sure I have the process right.

so to do this, I should need to change the lines here: train_gen = LineModGenerator("/Datasets/Linemod_preprocessed/", object_id = 1) test_gen = LineModGenerator("/Datasets/Linemod_preprocessed/", object_id = 1, train = False)

to point to linemod dataset (and pick an object_id number arbitrarily?) e.g.: train_gen = LineModGenerator("/LINEMOD/ape", object_id = 1) test_gen = LineModGenerator("/LINEMOD/ape", object_id = 1, train = False)

However, if I try the above, I get the error: Error: Invalid given rotation representation None. Choose one of the following: dict_keys(['axis_angle', 'rotation_matrix', 'quaternion']). Continuing using 'axis_angle' representation Error: path LINEMOD/ape/models does not exist!

this appears to be looking for the models folder (there in your Linemod_preprocessed but not in the original linemod dataset)...

Any advice/guidance on the proper conversion process?

Thank you!

ybkscht commented 3 years ago

Hi @geoffhorowitz,

The Linemod generator (generators/linemod.py) only loads the data for EfficientPose but does not convert the dataset from the original Linemod format into the "linemod_preprocessed" format. Unfortunately the Linemod generator assumes that the dataset is already in the needed "linemod_preprocessed" format. I got the preprocessed dataset from here and did not convert it by myself, so i don't have the scripts for converting.

This means you have to either convert your dataset into the same format as the Linemod_preprocessed dataset (the segnet_results folder is not needed) or adjust the linemod generator to load your dataset.

Sincerely, Yannick

maurocaio commented 3 years ago

Hi @maurocaio,

thank you very much!

That you don't have the 3D model makes it more difficult because EfficientPose utilizes the object's 3D model in the transformation loss. You could try to use the 3D bounding box of the object (just the eight corner points) instead of the full 3D model. I think this also should work. But keep in mind that you possibly can get problems with symmetric objects using this method due to ambiguities (multiple poses of the object can have the same appearance). In this case the network will be penalized unnecessarily during training which could worsen the performance. Our method tackles these symmetry problems using the more detailed shape information of the object 3D models (described in section 3.4) which is not given using only the 3D cuboids.

Except for this, you should be fine. You just need to convert the quaternion into axis angle representation as this is currently the only supported rotation representation.

Sincerely, Yannick

Thanks so much @ybkscht, this is that I was thinking to do and you confirmed.

maurocaio commented 3 years ago

Hi @maurocaio, thank you very much! That you don't have the 3D model makes it more difficult because EfficientPose utilizes the object's 3D model in the transformation loss. You could try to use the 3D bounding box of the object (just the eight corner points) instead of the full 3D model. I think this also should work. But keep in mind that you possibly can get problems with symmetric objects using this method due to ambiguities (multiple poses of the object can have the same appearance). In this case the network will be penalized unnecessarily during training which could worsen the performance. Our method tackles these symmetry problems using the more detailed shape information of the object 3D models (described in section 3.4) which is not given using only the 3D cuboids. Except for this, you should be fine. You just need to convert the quaternion into axis angle representation as this is currently the only supported rotation representation. Sincerely, Yannick

Thanks so much @ybkscht, this is that I was thinking to do and you confirmed.

Sorry @ybkscht , I try to lunch a train after that I checked my generator with debug.py and all arguments seem work..... But after first epoch I get this result:

C:\Users\Mauro\anaconda3\envs\EfficientPose\lib\site-packages\numpy\core\fromnumeric.py:3373: RuntimeWarning: Mean of empty slice. out=out, **kwargs) C:\Users\Mauro\anaconda3\envs\EfficientPose\lib\site-packages\numpy\core_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) C:\Users\Mauro\anaconda3\envs\EfficientPose\lib\site-packages\numpy\core_methods.py:234: RuntimeWarning: Degrees of freedom <= 0 for slice keepdims=keepdims) C:\Users\Mauro\anaconda3\envs\EfficientPose\lib\site-packages\numpy\core_methods.py:195: RuntimeWarning: invalid value encountered in true_divide arrmean, rcount, out=arrmean, casting='unsafe', subok=False) C:\Users\Mauro\anaconda3\envs\EfficientPose\lib\site-packages\numpy\core_methods.py:226: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) WARNING:tensorflow:From C:\Users\Mauro\Desktop\Tesi\EfficientPose-main\eval\eval_callback.py:240: The name tf.Summary is deprecated. Please use tf.compat.v1.Summary instead.

11 instances of class object with average precision: 0.0000 11 instances of class object with ADD accuracy: 0.0000 11 instances of class object with ADD-S-Accuracy: 0.0000 11 instances of class object with 5cm-5degree-Accuracy: 0.0000 class object with Translation Differences in mm: Mean: nan and Std: nan class object with Rotation Differences in degree: Mean: nan and Std: nan 11 instances of class object with 2d-projection-Accuracy: 0.0000 11 instances of class object with ADD(-S)-Accuracy: 0.0000 class object with Transformed Point Distances in mm: Mean: nan and Std: nan class object with Transformed Symmetric Point Distances in mm: Mean: nan and Std: nan class object with Mixed Transformed Point Distances in mm: Mean: nan and Std: nan mAP: 0.0000 ADD: 0.0000 ADD-S: 0.0000 5cm_5degree: 0.0000 TranslationErrorMean_in_mm: nan TranslationErrorStd_in_mm: nan RotationErrorMean_in_degree: nan RotationErrorStd_in_degree: nan 2D-Projection: 0.0000 Summed_Translation_Rotation_Error: nan ADD(-S): 0.0000 AveragePointDistanceMean_in_mm: nan AveragePointDistanceStd_in_mm: nan AverageSymmetricPointDistanceMean_in_mm: nan AverageSymmetricPointDistanceStd_in_mm: nan MixedAveragePointDistanceMean_in_mm: nan MixedAveragePointDistanceStd_in_mm: nan

Epoch 00001: ADD improved from -inf to 0.00000, saving model to D:\EfficientPose\models\Speed\object_1\phi_0_speed_best_ADD.h5

The problem start in eval/common.py in function _get_detections line 111 where model.predict_on_batch return all array values equal -1. Where is the problem? I trascured the masks because I see that these are used for augmentation.... it's really?

Thanks a lot and sorry for the disorder

geoffhorowitz commented 3 years ago

Hi @geoffhorowitz,

The Linemod generator (generators/linemod.py) only loads the data for EfficientPose but does not convert the dataset from the original Linemod format into the "linemod_preprocessed" format. Unfortunately the Linemod generator assumes that the dataset is already in the needed "linemod_preprocessed" format. I got the preprocessed dataset from here and did not convert it by myself, so i don't have the scripts for converting.

This means you have to either convert your dataset into the same format as the Linemod_preprocessed dataset (the segnet_results folder is not needed) or adjust the linemod generator to load your dataset.

Sincerely, Yannick

Gotcha, thank you for the info!

ybkscht commented 3 years ago

Hi @maurocaio,

The problem could be related to the missing masks. I use them in the augmentation to get a tight 2D bbox after rotating the image (get_bbox_from_mask in generators/common.py). Sorry I forgot to mention it in my previous answer. I'm not sure how you handle this but maybe this results in filtering out all annotations because no valid bbox can be found after augmentation. In this case your model learns to detect nothing which would match to your description which sounds like the model didn't detect any object in evaluation.

You can try to disable the 6D augmentation (colorspace augmentation should be fine) with --no-6dof-augmentation in train.py and see if it fixes your problem.

But if this is the problem you should already have noticed it when you tried using debug.py. If you use --draw_2d-bboxes with debug.py, can you really see the additional 2d bboxes (with enabled 6D augmenation)?

Sincerely, Yannick

maurocaio commented 3 years ago

Hi @maurocaio,

The problem could be related to the missing masks. I use them in the augmentation to get a tight 2D bbox after rotating the image (get_bbox_from_mask in generators/common.py). Sorry I forgot to mention it in my previous answer. I'm not sure how you handle this but maybe this results in filtering out all annotations because no valid bbox can be found after augmentation. In this case your model learns to detect nothing which would match to your description which sounds like the model didn't detect any object in evaluation.

You can try to disable the 6D augmentation (colorspace augmentation should be fine) with --no-6dof-augmentation in train.py and see if it fixes your problem.

But if this is the problem you should already have noticed it when you tried using debug.py. If you use --draw_2d-bboxes with debug.py, can you really see the additional 2d bboxes (with enabled 6D augmenation)?

Sincerely, Yannick

Thanks for the answer, the problem appared for the first epochs but, continuing the train, the network gived me some results like this: 1790/1790 [==============================] - 634s 354ms/step - loss: 8.3660 - classification_loss: 0.3138 - regression_loss: 0.2021 - transformation_loss: 392.5070 Epoch 19/500 Running network: 100% (2155 of 2155) |###############################################################################################################################| Elapsed Time: 0:01:43 Time: 0:01:43 Parsing annotations: 100% (2155 of 2155) |###########################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2155/2155 [00:00<00:00, 5562.11it/s] 2155 instances of class object with average precision: 0.5341 2155 instances of class object with ADD accuracy: 0.0000 2155 instances of class object with ADD-S-Accuracy: 0.0000 2155 instances of class object with 5cm-5degree-Accuracy: 0.0000 class object with Translation Differences in mm: Mean: 430.8826 and Std: 582.5865 class object with Rotation Differences in degree: Mean: 115.7376 and Std: 42.0521 2155 instances of class object with 2d-projection-Accuracy: 0.0000 2155 instances of class object with ADD(-S)-Accuracy: 0.0000 class object with Transformed Point Distances in mm: Mean: 904.5729 and Std: 498.1902 class object with Transformed Symmetric Point Distances in mm: Mean: 485.8314 and Std: 450.1264 class object with Mixed Transformed Point Distances in mm: Mean: 904.5729 and Std: 498.1902 mAP: 0.5341 ADD: 0.0000 ADD-S: 0.0000 5cm_5degree: 0.0000 TranslationErrorMean_in_mm: 430.8826 TranslationErrorStd_in_mm: 582.5865 RotationErrorMean_in_degree: 115.7376 RotationErrorStd_in_degree: 42.0521 2D-Projection: 0.0000 Summed_Translation_Rotation_Error: 1171.2589 ADD(-S): 0.0000 AveragePointDistanceMean_in_mm: 904.5729 AveragePointDistanceStd_in_mm: 498.1902 AverageSymmetricPointDistanceMean_in_mm: 485.8314 AverageSymmetricPointDistanceStd_in_mm: 450.1264 MixedAveragePointDistanceMean_in_mm: 904.5729 MixedAveragePointDistanceStd_in_mm: 498.1902

Epoch 00019: ADD did not improve from 0.00046

The values seem very hight and the ADD not improving at the moment. I hope that I wrong nothing and these decrease in next epochs

maurocaio commented 3 years ago

Hi @ybkscht , I've finally got good results with my dataset with ADD 62% without 6d-aug and 78% with aug. So I'm thinking to train your network incresing the phi hyperparameter (phi 0 --> phi 3) to aument ADD, but I get a new problem; when the network loads the model it return these warnings:

Done! Loading model, this may take a second...

WARNING: Layer efficientnet-b3 could not be found!

WARNING: Layer resample_p6 could not be found!

WARNING: Layer resample_p7 could not be found!

WARNING: Layer fpn_cells could not be found!

WARNING: Layer class_net could not be found!

WARNING: Layer box_net could not be found!

The train started but it seems very slow in term of loss. My question is that these warnings are normal or not. Thanks a lot!

Jorisdehoog commented 3 years ago

Hey @maurocaio, I have the same issue as you described, with model.predict_on_batch returning only arrays of -1. Do you remember what you did to solve this?

ame5r commented 3 years ago

Hey @maurocaio, I have the same issue as you described, with model.predict_on_batch returning only arrays of -1. Do you remember what you did to solve this?

hi, Can you please tell me how can i build my own custom dataset to fit the model? thanks.

finnweiler commented 3 years ago

Hi @maurocaio,

thank you very much!

That you don't have the 3D model makes it more difficult because EfficientPose utilizes the object's 3D model in the transformation loss. You could try to use the 3D bounding box of the object (just the eight corner points) instead of the full 3D model. I think this also should work. But keep in mind that you possibly can get problems with symmetric objects using this method due to ambiguities (multiple poses of the object can have the same appearance). In this case the network will be penalized unnecessarily during training which could worsen the performance. Our method tackles these symmetry problems using the more detailed shape information of the object 3D models (described in section 3.4) which is not given using only the 3D cuboids.

Except for this, you should be fine. You just need to convert the quaternion into axis angle representation as this is currently the only supported rotation representation.

Sincerely, Yannick

Hello, has anyone implemented an alternative loss function that only uses the eight corner points instead of the object's full 3D model? Kind Regards

teigl commented 2 years ago

Hi @ybkscht , I've finally got good results with my dataset with ADD 62% without 6d-aug and 78% with aug. So I'm thinking to train your network incresing the phi hyperparameter (phi 0 --> phi 3) to aument ADD, but I get a new problem; when the network loads the model it return these warnings:

Done! Loading model, this may take a second...

WARNING: Layer efficientnet-b3 could not be found!

WARNING: Layer resample_p6 could not be found!

WARNING: Layer resample_p7 could not be found!

WARNING: Layer fpn_cells could not be found!

WARNING: Layer class_net could not be found!

WARNING: Layer box_net could not be found!

The train started but it seems very slow in term of loss. My question is that these warnings are normal or not. Thanks a lot!

For me this happened because the layer names returned from 'load_attributes_from_hdf5_group' were in encoded format. The warning is "real" in the sense that the weights are not being, at least for me. I also was not getting any detections which caused the -1/nan issue. Fixed it by decoding the names.