Closed jgcbrouns closed 5 years ago
I managed to get my "Acc using x vx 3D transformation up to 20 - 30%" using a more appropriate diameter in the .data file. I calculated this diameter using a piece of code that pair-wise compares all vertices and takes the pair with the greatest distance.
Unfortunately I am not able to get any accuracy at the other 2 metrics (Acc using 5px 2D projection and Acc using 5 cm 5 degree metric). It seems that the model is unable to learn from my data? Might the data be too monotonous? If so I would expect to at least see some accuracy metrics go up (because the model would just over-fit).
Too bad that nobody seems to be able (or willing) to help. I have a little update:
My accuracy after 1000 epochs is:
What I find weird is that LINEMOD has only ±200 images per class for training (and ±1000 for testing). Hence, their training-set is very small, yet @btekin was able to reach high accuracy >90%. @btekin, I would love to hear your input about this. Is this due to pre-training? Can you tell us a bit more about how long your pretraining took and what parameters you used in your training/test set?
Sidenote: I do believe that the trainings-images of LINEMOD are selected in such a way that together they represent a good 3D presentation of an object. In other words, every image in the trainigsset features the object from a unique POV (unique camera angle, position etc). Could this be of influence on why my model cannot seem to learn with only 240 images? On the other hand, I do expect the model to at least learn something with a small datset, when increasing the amount of epochs to train for; the model would overfit but still should learn and return high accuracies on the training-set.
Any thoughts?
Hello, thanks for your interest in our code and sorry for the late reply, I didn't have the time to reply as I had to deal with my other work related projects.
We follow the same training/test splits with earlier work, e.g. that of the BB8 paper by Rad & Lepetit, ICCV'17. Indeed, the training examples are sampled such that they cover a wide variety of viewpoints around the object. Having a more representative training set should, in principle, increase the accuracy on test examples.
Instead of using initialization weights for another object, you could also pretrain the network on the same object by setting the regularization parameter for the confidence loss to zero, as explained in the readme file. See also the discussion in the paper and in #79 why such pretraining could be useful.
About your findings: We already mention what order should be used for the keypoints in this link along with a step-by-step guide and this was also discussed in the duplicate issue #68. Custom datasets have different camera intrinsics matrix and might have different object models/scales. The scale of the object model should certainly be consistent and be set appropriately for a new dataset.
Hello @btekin. Thank you for your answer, yet I would like to ask you to be more specific.
The LINEMOD dataset uses ±200 images for training (while having 1200 images per class: 1000 remain for testing). How can ANY model learn from only 200 images? Aren't NNs like YOLO supposed to have thousands of images per class?
_Sidenotes:
You speak about pre-training being necessary. Can you be more elaborate about this? how long did you train for? How many images per class? 200 again? How many epochs.
At this point I am distraught and about to give up...
Same problem here. I raised an issue #85 mentioning that my model also is not learning. And the thing is that, while validating, I can only use the first trained model which is after epoch 11. Even if my model keeps on training, it never updates the weights means that its never getting any better. It just save the summary in costs.npz file
I used a very good source for generating my synthetic data, I have masked images too that are corresponding to my RGB images, correct intrinsics, precise labelling files, exact diam value and rightly scaled .ply file. I am using 1170 images with 65 different orientation of the object and different background and just using one class (object). I also previously tried the 15% training/85% testing ratio split but also didn't learn.
Note: The maximum number of epochs I have reached was 177, but non of those weights was saved and the one at epoch 11 was only saved and never updated.
After comparing my inputs with that of ape.data and finding everything is matching yet not learning for custom data, I am also about to give up...
Hi @MohamadJaber1
Your green ground truth box does seem to be an incorrect bounding-box though. Maybe there still is something wrong with the labeling for your case? If you see my green ground-truth bounding boxes, they are exactly matching the object. It is indeed the case that the code is written in such a way that it will not save weights when there is no increase in accuracy. You can add a line of code to make it save weights after every 10 epochs or so. I tried this as well, but it is not helpful; if the accuracy does not increase, you can save weights all you want, but they are useless. [edit]: @MohamadJaber1. What batch-size are you using? According to your post: (.cfg file changed to batch size = 4 and subdivision = 4 as it was showing that CUDA is out of memory)
I have just now tried to train a model on 16.000+ images. I stopped the training at around 310 epochs, because I don't have the time to continue training this model, since I am only using a Nvidia 1080ti. Interesting enough, the loss function is pretty low, yet accuracy does not rise.
At this point I think that in general, 1000 images for training should do the trick. Because I can only use a batch-size of max 8, I think that I have to train for far more epochs than proposed in the code (700 epochs). It would be nice to hear @btekin input 😄
Hi @jgcbrouns, good to hear from you and hope to hear also from @btekin soon 😄
Thats true, even my ground truth labels aren't matching which is quite strange. I forked @juanmed singleshotpose as he used the same source for generating the data I used and he also created a script called _ndds_datasetcreator.py that inputs your 3D bounding box configurations and output a label text file that is compatible with singleshotpose. You can also visualize the points (labels) at each of your images. If you can write the way you created your label files.
For saving the model, I know I can modify the script by saving whatever but as you said it is useless if the model isn't getting any better. But my question was, why from the first place my model isn't getting any better? Why accuracy is 0?
By CUDA out of memory, I meant I reduced the batch size and subdivision to 4 as the yolo-pose-pre.cfg was having a batch size of 32.
I think that the original singleshotpose using LINEMOD dataset was trained over than 700 epochs in way that they kept on updating the initialization weights with the trained model and trained allover again to improve accuracy or changed the 700 epochs to any other value in thousands.
I also think that # epochs was way in the thousands.
"But my question was, why from the first place my model isn't getting any better? Why accuracy is 0?"
For your case it is pretty straightforward I think: fix the bounding box corners (check for ground truth box correctness) and your model will learn at least something (like mine). Your other settings seem to be correct: .ply file, diameter of ply vertices, camera intrinsics. Another tip: check if your .ply file has multiple vertex points. I see that your object is a lego-block. A cube in general can be modeled as a parametric 3d model. The linemod objects all have many vertices and edges in their model. @btekin uses the individual vertices to calculate accuracies against. In my hypothesis, more vertices adds more change for higher accuracy. What you could try is add more vertices in your model via blender:
open 3d .ply model -> select all vertices and edges -> go into edit mode -> press 'w' --> click 'subdivide'
But again, before that, fix the ground truth boundingbox label coordinates :P
I looked at the Nvidia data generator, but decided to create my own tool to label data in a Unity environment. Its more straightforward than the Nvidia tool. I rule out any mistakes in the way I generate the data and the labels since the ground-truths are correct. If you want, you can try my tool as well.
@jgcbrouns @MohamadJabar1 Thank you for your kind feedbacks 😄
Hello, a maximum number of 700 epochs were set for training but the model always converged much earlier than that.
The model was trained and validated on the LINEMOD dataset. Depending on your custom data, training might proceed in a different way and might require different number of epochs, learning rates, batch sizes learning schedule etc. I would be interested in looking at your own custom synthetic datasets if you could share it to better understand what problems you are having.
For LINEMOD, we use the standard training/test splits and apply extensive data augmentation by changing the background with segmentation masks, random scaling/translation, etc. Without using segmentation masks the accuracy might not be good enough because of the lack of generalization ability. To increase generalization, you could change the background of your images using segmentation masks or increase the number of your training samples.
Please also check #84 to see if your problem comes from this. @jgcbrouns it seems that your corner predictions seem accurate, however solvepnp method of the opencv version you are using might be returning inaccurate results. Could you visualize your predictions before pnp and let me know if they are accurate?
@MohamadJaber1 Symmetric objects with uniform texture generally brings additional challenges because of the pose ambiguity. Could you also try with non-symmetric or well textured objects and let me know how it performs.
hi @btekin . Thanks for your respons!
I think that the images that I posted above visualize the individual corners before PnP, straight from the predictions (red) and ground truths (green):
ax.scatter(corners3D[0], corners3D[1], -corners3D[2], zdir='z', c= 'red')
ax.scatter(vertices[0], vertices[1], -vertices[2], zdir='z', c= 'red')
where vertices and corners3D is:
vertices = np.c_[np.array(mesh.vertices), np.ones((len(mesh.vertices), 1))].transpose()
corners3D = get_3D_corners(vertices)
It would be AWESOME if you could take a look at my dataset!! I am breaking my head about this every day 😅 🔫
[edit] I just looked at the ransac replacement algo for OpenCV right now in validation procedure. There is unfortunately no difference. I will now attempt train a small model with it.
Thank you both for your replies @jgcbrouns @btekin for your reply. I will follow your comments and try them on Tuesday - Wish you both Happy Easter 😄)
@jgcbrouns Reason why I am using NDDS is that I want later to validate my model with an image taken from a robot software environment and NDDS provided me with all the necessities.
@btekin It would be really great if you can take a look at our dataset (I will provide a sample)
@jgcbrouns Thank you for providing your dataset. After inspecting examples from your dataset, I would suggest you to do the two following things and see if they help:
Reduce color data augmentation. This could be useful because your object does not have any texture and the only cue that is useful to predict the pose is color. When you apply color data augmentation (changing hue, saturation values etc.) during training, the model has difficulty in distinguishing between different colors and hence estimating the pose. The current setting for the color data augmentation could be too high for your data. You can change the values for color data augmentation at the following lines: https://github.com/Microsoft/singleshotpose/blob/master/dataset.py#L67:L69
Randomly change the background. This could be useful as you have a small number of training examples and although you use different backgrounds for different training images, having static backgrounds for the same examples might result in overfitting.
Hope these pointers might help your problem. Please let me know how it goes.
@MohamadJaber1 As @jgcbrouns pointed out, I think you would need to fix the bounding box label coordinates in order for the network to start learning. If you provide a sample, I could also take a look at your data.
Hi @btekin and @MohamadJaber1
Thank you @btekin for looking at my dataset. Coincidentally I managed to gain some results this morning by converting my images (which are .png) to .jpg. I suspected that this could be of influence since .png can have transparent pixels in its images. Moreover I extended my tool with a mask over objects. Every object now has a mask as well. Either one of these changes fixed the problem for me.
@jgcbrouns good to hear that, I will also convert the types of my images to jpg later
I tried your dataset and checked the ground truth labels - they were matching fine!
At the cmd window, I am writing
python train.py cfg/cube.data cfg/yolo-pose.cfg backup/benchvise/init.weights
But the problem is that after epoch 11, the model.weights is saved once and never updated. I trained the model for over than 140 epochs but even though never updated.
@btekin Thank you so much for offering this help.
Google driver link 400 RGB images, 400 masked images, 400 labels files, Lego.ply, test.txt, train.txt, training_range.txt (.json files from NDDS). So originally the images are **18 same pose images***65 different orientations = 1170 total images.
Please let me know what can you observe and how can I solve it Quick Update: I think that the labellings are right, but the problem might be from there order, from label_file_creation.md, point 2 we can know the order, but it depends up to the convention of the coordinate system singleshotpose is expecting. I suspected that (x:- right, y:- up, z:- front)
Hi @MohamadJaber1
0% acuracies after 100 epochs implies that there is something wrong for sure. The model is supposed to converge rather quickly (on average at epoch #30, the model starts showing accuracy increases. Before that, its stays at 0%).
Here is my final dataset including masks and .jpg files (converted from .png): Google Drive link You can try training with this dataset. It should work 😄
I'll take a look at your dataset now. P.S. @MohamadJaber1 - If you want, we can stay in touch. You can hit me up at jeroen.brouns@philips.com
About your quick update: labels are indeed ordered with a specific coordinate system in mind. Unity (where I create my dataset has a different coordinate system than blender for example). The order is important because in @btekin code, this order gets interpreted and adhered to: link
You can validate if your order is correct:
Could you upload your dataset as 1 zip file that includes everything? .PLY model, generated (normalized) labels, images, masks etc. That makes it easier for me to test it.
Hi @jgcbrouns Thank you for this clarification, I am so glad to know that your dataset is working with the model. This gives me hope and motivation to dig more and fix mine also :smile:
Yea sure, it would be nice to contact you. Expect a mail soon :sunglasses:
I am currently training on your dataset to see if it will converge with me. Also, I will go few steps backwards to check all my own custom data (mostly the labels). For the zipped version of my data, please find it here Google drive link for my 1170 images Thank you very much
(My .PLY apparently needed to be scaled)
Hi @jgcbrouns Your dataset is learning with the model pretty good :smile: This was trained for almost 60 epochs and its already showing good results :+1:
@jgcbrouns thank you for all the info you've shared in this thread. I am currently trying to figure out what's wrong with my dataset (or dataset generating tool). I am trying to learn on the dataset you've provided in this thread. After that I would try to generate the same dataset with the tool I am using to see if it's working as expected.
Could you please provide the texture that you've used for your object and also if possible your data generating tool?
Hi everyone!
I was already having a discussions about my issues in issue-68, but decided to open a separate ticket anyway for completeness towards other people. As of now. I am clueless what is wrong with my model. My workflow and solved issues are as follows:
I came across multiple issues regarding the following:
I set my camera calibration intrinsic parameters as follows:
Annotated labels are created automatically in Unity3D of which I expect there to be no camera distortion (for the intrinsic camera calibration). The original author's of the LINEMOD dataset use a Kinect camera that does have such a camera distortion. This internal camera calibration is necessary for among others the PnP-algorithm.
Initially, I had many problems with creating my .PLY files:
Bounding-box coordinates need to be in a specific order in their respective annotated label file. If this is not the case, the bounding-box that gets generated form the .PLY file and predictions from singleshotpose will be distorted completely. There are multiple issues on Github about this (among which issue-49
Find here, an example of an image and a label file from my trainings-set.
HOWEVER.
I am still not obtaining correct results and I am unsure about how long to train my models for. In the implementation 700epochs are stated, yet if I train for 4000 epochs, my results are still not good:
How many epochs should a new object be trained for? NOTE: I am using benchvise/init.weights as my initial weights for my new model of the custom data-set. This while my loss function goes down properly, but my accuracy measurements stay at 0%:
Could there still be a problem with how I created the annotation files, camera intrinsic parameters or .PLY model. Or could there be another problem that I am not considering?
@btekin Would it be an idea to add a F.A.Q section to the README, using my findings? I think the section about training on a custom data-set could use a lot more elaboration.
Moreover, I am curious as to what people are doing with singleshotpose. Anyone experimenting with some interesting use-cases?
Many thanks for anyone that can help!