reg_loss oscillation, not convergent

ilaij0810 commented 2 years ago

when training LaneATT model with CULane dataset, the reg_loss is not convergent.

voldemortX commented 2 years ago

@ilaij0810 what is your final eval performance? The LaneATT is a anchor-based method, its reg loss does not change much over time.

ilaij0810 commented 2 years ago

thank you! the reg_loss is from resnet18-laneatt and the final eval metrics is not finished, but i see the visualizations and compare to GT, the lanes detected fits well with GT, so I confused. Then i train resnet34-laneatt model, about 5 epochs, loss is less than resnet18-laneatt obviously(red lines on the figure)! ![Uploading image.png…]()

ilaij0810 commented 2 years ago

add the figure.

voldemortX commented 2 years ago

@ilaij0810 it is expected that resnet34 performs better. The reason for a not much changing regression loss are these:

The LaneATT anchors are most frequent 1000 lines selected on the train set.
The reg loss is only calculated on the matched GT-Pred pairs.
Lane lines tend to have similar positions.

So the loss will statistically be very small even at the begining.

Let me know if the final evaluated F1 score can't match the reported.

ilaij0810 commented 2 years ago

ok, thank you.

ilaij0810 commented 2 years ago

not use official CULane metric implementation in C++ for problems with building Opencv3.2. all results getting from implementation by Python. 1、resnet18-laneATT test_split metrics: test0_normal.txt: TP: 29203 FP: 2494 FN: 3574 Precision: 0.9213 Recall: 0.8910 F1: 0.9059 test1_crowd.txt: TP: 18838 FP: 5488 FN: 9165 Precision: 0.7744 Recall: 0.6727 F1: 0.7200 test2_hlight.txt: TP: 1017 FP: 373 FN: 668 Precision: 0.7317 Recall: 0.6036 F1: 0.6615 test3_shadow.txt: TP: 1893 FP: 875 FN: 983 Precision: 0.6839 Recall: 0.6582 F1: 0.6708 test4_noline.txt: TP: 5453 FP: 3692 FN: 8568 Precision: 0.5963 Recall: 0.3889 F1: 0.4708 test5_arrow.txt: TP: 2601 FP: 289 FN: 581 Precision: 0.9000 Recall: 0.8174 F1: 0.8567 test6_curve.txt: TP: 742 FP: 248 FN: 570 Precision: 0.7495 Recall: 0.5655 F1: 0.6447 test7_cross.txt: TP: 0 FP: 1091 FN: 0 Precision: 0 Recall: 0 F1: 0 test8_night.txt: TP: 13024 FP: 4117 FN: 8006 Precision: 0.7598 Recall: 0.6193 F1: 0.6824 2、resnet34-laneATT test_split metrics: test0_normal.txt: TP: 29429 FP: 2429 FN: 3348 Precision: 0.9238 Recall: 0.8979 F1: 0.9106 test1_crowd.txt: TP: 19244 FP: 5488 FN: 8759 Precision: 0.7781 Recall: 0.6872 F1: 0.7298 test2_hlight.txt: TP: 1058 FP: 301 FN: 627 Precision: 0.7785 Recall: 0.6279 F1: 0.6951 test3_shadow.txt: TP: 2036 FP: 570 FN: 840 Precision: 0.7813 Recall: 0.7079 F1: 0.7428 test4_noline.txt: TP: 5472 FP: 3410 FN: 8549 Precision: 0.6161 Recall: 0.3903 F1: 0.4778 test5_arrow.txt: TP: 2658 FP: 289 FN: 524 Precision: 0.9019 Recall: 0.8353 F1: 0.8674 test6_curve.txt: TP: 765 FP: 258 FN: 547 Precision: 0.7478 Recall: 0.5831 F1: 0.6552 test7_cross.txt: TP: 0 FP: 1202 FN: 0 Precision: 0 Recall: 0 F1: 0 test8_night.txt: TP: 13474 FP: 4100 FN: 7556 Precision: 0.7667 Recall: 0.6407 F1: 0.6981

voldemortX commented 2 years ago

@ilaij0810 That seems within reason for a random run (slightly lower than our best run).

voldemortX commented 2 years ago

Did you also experience difficulty using the cpp test scripts? Maybe I should consider porting the Python code and making the backend switching easier (preferably with one config parameter backend=python).

ilaij0810 commented 2 years ago

I don't try again, i think all these models using the same metrics code, it will helpful for evaluation, too. maybe a period later i'll again try to use cpp test scripts. Thank you very much, your projects has helped me a lot and also your immediately reply to my problems.

voldemortX commented 2 years ago

I think this issue is concluded. Feel free to ask for reopen.

voldemortX / pytorch-auto-drive

reg_loss oscillation, not convergent #99