Open s2137127 opened 1 year ago
Thanks for your attention! For your first question, you can try the following method to solve the problem:
For your second question, G is a set of rotation matrix. When you apply one of the rotation matrix in G to an object, the space occupied by the object will not be changed. Intuitively, you can imagine what will happen when you rotate a Rectangular cuboid 180 degrees around any axis. Additionally, the G values of Pepper and tless20 can be found in Sileane dataset. The download link is under README, You can find G in the poseutil.json.
Thank you for your thoughtful response. I have a few more questions to ask:
I am using the Sileane Dataset tless20 for training, and the object type is defined as follows: type_tless20 = ObjectType(type_name='tless', class_idx=0, symmetry_type='finite', lambda_p=[[0.0155485, 0.0, 0.0], [0.0,0.0248085, 0.0], [-0.0, 0.0, 0.0171969]], G=[ [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[ -1.0, 0.0, 0.0],[ 0.0, -1.0, 0.0],[ 0.0, 0.0, 1.00]]])
The internal parameters are based on the specifications provided by the Sileane Dataset. The model configuration is taken from your provided source on GitHub. The poseutile.json used for pose recovery evaluation is also provided by the Sileane Dataset. During training, the data consists of cycles from 0 to 250 with object numbers ranging from 0 to 10. The same range is used for testing as well. However, the average precision (ap) of the training results is observed to be 0.3. Is there something I might have missed or not modified correctly? I have noticed that the training results for symmetric objects are not good, but I am unsure of the underlying reason. I would appreciate it if you could provide further clarification when you have the time. Thank you.
For your first question, generally, the distance threshold is one tenth of the diameter of the object’s surrounding ball. For your second question, You can try to visualize the prediction results so as to locate if the problem is in the prediction or evaluation stage. Then you can carefully check the code and locate the bug. The visualize code is in the evaluate.py.
Thank you for your previous response. I have successfully conducted pose estimation using a self-generated dataset, achieving high accuracy. However, I have encountered an issue. While training and testing with 1-10 objects yield excellent results, the prediction of central positions deteriorates significantly when training and testing with 1-10+ objects. This leads to poor clustering and weak pose estimation. Could you please explain the possible reasons behind this? I have also trained and tested with 1-30 objects using the open datasets from your paper, and the results were quite good.
You can check from the following aspects:
Hello, I have the following two questions that I would like to ask:
generate_train_dataset.py
script provided by you for converting to point clouds, the formula used is: Xcs = - (us - cx) Zcs / fx, Ycs = - (vs - cy) Zcs / fy. Could you please explain the reason for the additional negative sign?1.If the training loss is small but the eval loss is large, it may indicate the coordinate system between the training set and the test set are different. Try visualizize them for inspection. 2.If the training loss and eval loss are both large, it indicates that the label still has problems. 3.Check if the translation labels and point clouds are both converted to mm during training may help.
Hello, currently, I am using pybullet to generate a dataset and train a model. The model's accuracy in predicting stacked synthetic images is quite high. However, when it comes to recognizing real images, if there is a slight stacking in the predicted images, the results are not good. When visualizing the movement of each point to its predicted center position, I notice that each set of points cannot be completely separated. However, if the images are not stacked, the recognition is somewhat acceptable, at least the points for each object are clustered in their respective regions. What could be the reasons for the model's inability to predict the stacking of two objects accurately?
You can try to adjust the bandwidth and min_bin_freq parameters in meanshift.
What I mean is that the predict centroids in the virtual data are well separated, but in the case of stacked objects in real data, its predict centroid cannot be completely separated.Do you perform any preprocessing after capturing real images before testing?
1.you can check if there is a significant domain gap between the real and synthetic dataset, the domain gap will diminish the performance of network on real dataset. 2.You can use domain randomization on the simulated dataset, e.g., adding noise, to improve the network's performance on the real dataset.
Hello, dear author. Currently, I have encountered the following issues during my training process, and I am unable to resolve them:
When I train using the ringscrew data from the IPA Bin Picking Dataset, I achieve an AP (Average Precision) of 0.8. However, when I generate ringscrew data using Blender, the AP drops to only 0.2. On the other hand, if I train using the TLESS22 dataset generated from Blender, the AP can reach as high as 1. Additionally, when I use other asymmetric objects, such as a doorknob, generated data for training, the AP is 0.9. I would like to understand why I am encountering this issue.
If I use symmetric objects like ringscrews or candlesticks as models for training data, how should I set the value of G (the set of rigid transformations that have no effect on the static state of the object)? Alternatively, could you please provide me with the G values you used for training Pepper and TLESS-20 for reference? Thank you.
I am seeking your guidance or insights to help me address these problems. Thank you.