distributed training? - Githubissues

jiayily commented 4 years ago

It seems that your code doesn't support distributed training. I’m trying to translate your code to mmdetection? I have some doubts:

How much does the hard sampling improve the final performance?
The scheduler in your code is quite strange. Can I use another one such as CosineAnnealing or MultiStep?

Looking forward to your reply. Thanks!

koyeongmin commented 4 years ago

Hard sampling module is not a major factor, but in CuLane dataset, the hard sampling shows more improvement than TuSimple dataset.
Yes!, My scheduler is not good.. You can use another scheduler! Thank you!

jiayily commented 4 years ago

It seems that learning rate will never change during training in your code? I have tried a CosineAnnealingLR, but the peformance degrades a lot. Screenshot from 2020-10-30 15-30-07

The confidence loss and attention loss look strange. Do you have any idea about this? Thank you

jiayily commented 4 years ago

Can you show me your result on tusimple dataset without hard sampling? I got a result with mmdetection so follows:

Accuracy 0.9388447417068856 FP 0.09802043750641848 FN 0.062065660196501254

koyeongmin commented 4 years ago

https://github.com/koyeongmin/PINet This is the previous version of PINet, and this model got 96.64% (accuracy) without hard sampling. Unfortunately, the result of the current version without hard sampling does not remain, but I remember the performance is similar to the previous version. Thank you!

jiayily commented 4 years ago

Thank you so much. I will try to tune some other parameters. By the way, do you remember the result on tusimple dataset without data agumentation?

koyeongmin commented 4 years ago

Sorry, I don't remember it because I have always used data augmentation after first some trials.

jiayily commented 4 years ago

OK. Do you have any idea about the existence loss. I see you comment out confidences = torch.sigmoid(confidences) I tried sigmoid on the confidence, the loss curve is perfect but the accuracy drops a bit.

koyeongmin commented 4 years ago

I also tried other loss like sigmoid and cross-entropy, but performance is not good...

jiayily commented 4 years ago

Thank you so much for your patience and your nice work.

jiayily commented 4 years ago

@koyeongmin Have you ever tried dice loss and lovasz hinge loss? I tried these losses on the confidence branch with thresh_point=0.5. The performance got a little improvement. I think the dice loss is suited for tusimple evaluation and the lovasz hinge loss is suited for culan evaluation (miou metric).

koyeongmin commented 3 years ago

Thank you! I have not tried these losses. I wiil try at future works. Thank you!

jiayily commented 3 years ago

Hello, have you ever studied self-attention-distillation (https://github.com/cardwing/Codes-for-Lane-Detection)? I think your attention loss have the same intuition with it. But how does it work when we only have one hourglass?

mvish7 commented 3 years ago

@Jiayi719 did you succeed in distributed training of PINet?? and did you use PyTroch DDP for it??

jiayily commented 3 years ago

@mvish7 Yes. I used DistributedDataParallel to implement it.

Wolfwjs commented 3 years ago

Hello，could you please share your code or core code of PINet based mmdetection? thx!

mvish7 commented 3 years ago

Hi @Wolfwjs Unfortunately I cant share the code. I used torch DDP. The problem I had was related to not using all the outputs of the model. I.e. i used some detach commands in wrong places.

Wolfwjs commented 3 years ago

thank you for your reply！ @mvish7

Wolfwjs commented 3 years ago

Hello，have you successfully run PINet based on mmdection? @Jiayi719

koyeongmin / PINet_new

distributed training? #16