Questions about training time

haopo2005 commented 4 years ago

Hi, I'm curious why the training epoch is so high e.g. 384 ~ 2000+. Does it take most of time for the polynomials fitting?

What's the polynomials performance at the earlier epoch?
What's the training loss value e.g. tusimple data set after 2000 epoch?

lucastabelini commented 4 years ago

Yes, what takes the most to converge are the polynomials. The performance is not bad at earlier epochs, if I remember correctly, they are only a few accuracy points below the final accuracy. The training loss is around 0.05 in the final epoch. The exact values can be seen on the file log.txt inside each experiments directory in the Google Drive link I provide in the README.md.

haopo2005 commented 4 years ago

I've checked the tusimple_full log. After 2695 epochs, the training loss is around 0.04-0.07 and poly loss is around 0.02-1.4988. I'd like to know

whether there is big visual difference between epoch 2695 and epoch 50.
How to choose best model in case of no validation loss/score in "tusimple_full log"
Is there any advice about the learning policy to make the training with less epoch.

And samples of my dataset are 10 times larger than tusimple. It takes too much time for training. I'm not sure whether I'm on the correct way. The lane line is almost wrong at first glance. However, I can find it gets better with more training iterations.

Besides, I've also got questions about the transfer learning. I've tuned the pre-trained model of tusimple full. (change max lane number and add classification layer). Do I need to fix parameters from previous layers and only fine tuned the last layer? Currently, all parameters are not fixed. And I find the training loss seems no difference from the training from scratch (efficient-b0 imagenent)

lucastabelini commented 4 years ago

Since "big visual difference" is quite subjective, I think it would be easier if you visualize both results and compare them yourself.
In our paper, we simply chose the last one. Without a validation set there's not much you can do.
Unfortunately, no.

In our case, we also found that not even after all those epochs the model stopped learning, as in the accuracy was still increasing. As to transfer learning, we did not freeze the parameters, we trained the whole model. I would say that at first, the loss would have no difference to training from scratch, but the model should converge faster.

haopo2005 commented 4 years ago

https://github.com/lucastabelini/PolyLaneNet/blob/master/lib/datasets/lane_dataset.py#L71 This line may be buggy. Since you can't guarantee the point y-coordinate is sorted, the line endpoint will not the upper and lower bound. Especially for turn around and sharp curve case.

lucastabelini commented 4 years ago

In the three datasets used in that work, that assumption was valid. For other datasets, it may not be. That problem should be easy to fix though. If you change that line to use min/max instead you should be fine, I think.

haopo2005 commented 4 years ago

Hi, I've tried to inference the tusimple model 2695 on tusimple dataset. And I'm confused why the the upper bound is beyond the horizontal line. Is there any post processing measure to handle this?

haopo2005 commented 4 years ago

lucastabelini commented 4 years ago

Did you run the inference without calculating the loss? The model only predicts an upper limit (the horizon line) to the first lane, and when the loss is calculated, that limit is copied to the other lanes. This happens here. That copy should be made in the model, so that calculating the loss is not necessary for inference, but it is not a major issue as it can be easily fixed. You just need to do the same thing that line does in the loss function.

haopo2005 commented 4 years ago

ok, thank you so much...

lucastabelini / PolyLaneNet

Questions about training time #20