Training with BB3TXTLoss

libornovax / master_thesis_code

Code for my master thesis: Vehicle Detection and Pose Estimation for Autonomous Driving

MIT License

187 stars 69 forks source link

Training with BB3TXTLoss #26

Closed libornovax closed 7 years ago

libornovax commented 7 years ago

I carried one test on 20170318 with the same network as was used for 2D bounding boxes, but the performance was poor. Thus I decided to make some changes - enlarge the network.

This is the second part of the training curve: learning_curves2

libornovax commented 7 years ago

More conv layers

First change I made was to increase the number of conv layers. learning_curves

Because this improved the performance a lot even though the number of parameters is not so high and it is still reasonably fast, I will be using this network from now on:

macc3d_0.25_r2_x4
r2 c0.25
conv k3      o64
conv k3      o64
pool
conv k3      o128
conv k3  d2  o128
conv k3      o128
pool
conv k3      o256
conv k3  d1  o256
conv k3  d3  o256
conv k3  d5  o256
conv k3      o256
macc x4

libornovax commented 7 years ago

More conv layers and more parameters

The same network architecture with more conv layers, but increased number of parameters (channels) of each layer. learning_curves

The performance of the network seems superior, but it takes about 3x longer to train and 3x longer to run during detection. Also the number of parameters is approximately 4x larger (about 50MB). I will not be using this network now, but it is important to remember that adding more parameters improves the performance!

libornovax commented 7 years ago

Weighted coord diffs

Because from images I saw I had the feeling that the predicted coordinates were not very precise I changed the diff weighting during training to apply the same weight on the coordinate diffs as on the positive probability diffs. learning_curves

We see that suddenly we achieved similar performance as in the network with more parameters. And visually the coordinates are much better.

libornovax commented 7 years ago

Filtered KITTI dataset, flipping

I removed trucks from the car dataset, also added flipped images to the training set. Very occluded cars and very truncated cars were also removed from the training set. This is the result. learning_curves

The learning curve seems to be oscillating much less, however there is a chance the net is overfitted to the training set. Nevertheless, the detected bounding box coordinates are much more precise and coherent.

jeannotes commented 4 years ago

hi, for 100000iterations, how much time does it cost? @libornovax

libornovax commented 4 years ago

@jeannotes depends on your GPU. I think for me it was about 3 days on Tesla K40