some detail of kitti implementation

AndyYuan96 commented 3 years ago

Hi, Yezhen, I'm reproducing kitti result, and I'm have some question about the detail with pvrcnn on kitti dataset.

you use rpn output or roi output as the input of pesudo label filtering？
in paper，you say you don‘t use LHS module，so just use class probability and iou prediction to filter pesudo label ？

AndyYuan96 commented 3 years ago

with my understanding：

use the top 100 proposal from rpn , and go through roi head get the pseudo label.
according to max(class probability) and iou prediction, filter pseudo label.
no LHS
what do you mean that we therefore only additionally filter according to classification confidence with the threshold tcl = 0.2? for slective supervision?

baraujo98 commented 3 years ago

Hi, Yezhen, I'm reproducing kitti result, and I'm have some question about the detail with pvrcnn on kitti dataset.

you use rpn output or roi output as the input of pesudo label filtering？

in paper，you say you don‘t use LHS module，so just use class probability and iou prediction to filter pesudo label ？

Hi @AndyYuan96! I'm also planning on reproducing the KITTI results and experimenting a bit very soon, as I'm studying label-efficient learning techiques for outdoor datasets. Are you doing this in a fork of yours?

AndyYuan96 commented 3 years ago

Hi, Yezhen, I'm reproducing kitti result, and I'm have some question about the detail with pvrcnn on kitti dataset.

you use rpn output or roi output as the input of pesudo label filtering？

in paper，you say you don‘t use LHS module，so just use class probability and iou prediction to filter pesudo label ？

Hi @AndyYuan96! I'm also planning on reproducing the KITTI results and experimenting a bit very soon, as I'm studying label-efficient learning techiques for outdoor datasets. Are you doing this in a fork of yours?

sure， you can add my wechat or telegram。

yezhen17 commented 3 years ago

Hi, Yezhen, I'm reproducing kitti result, and I'm have some question about the detail with pvrcnn on kitti dataset.

you use rpn output or roi output as the input of pesudo label filtering？

in paper，you say you don‘t use LHS module，so just use class probability and iou prediction to filter pesudo label ？

Hi Andy,

roi output;
Yes.

yezhen17 commented 3 years ago

with my understanding：

use the top 100 proposal from rpn , and go through roi head get the pseudo label.

according to max(class probability) and iou prediction, filter pseudo label.

no LHS

what do you mean that we therefore only additionally filter according to classification confidence with the threshold tcl = 0.2? for slective supervision?

It means that we do not use the foreground probability (since it is already used in rpn to select some proposals) to filter pseudo labels. In VoteNet, since it is single staged, we use the objectness to filter pseudo labels.

yezhen17 commented 3 years ago

I'll release the code ASAP

AndyYuan96 commented 3 years ago

I'll release the code ASAP

thanks for reply， now I have implement most of the thing based on openpcdet.

I use the 100 roi as pseudo label filtering input, and filter pseudo label with iou 0.8，0.4，0.4 for different class，and max probability of 3 classes should bigger than 0.2.
so how to selectively supervise model output with pseudo label? only supervise roi's output , don't supervise rpn, still use 100 roi output?

yezhen17 commented 3 years ago

Supervise the box regression losses (including refinement) and the classification loss. Ignore other losses on unlabeled data.

Regarding the number of roi outputs, I just kept the same as pvrcnn.

AndyYuan96 commented 3 years ago

Supervise the box regression losses (including refinement) and the classification loss. Ignore other losses on unlabeled data.

Regarding the number of roi outputs, I just kept the same as pvrcnn.

Now,I think I understand what you say. for rpn output, we should only supervise the proposal meeting the foreground situation(iou with pseudo label > some setting in pvrcnn) with regression loss and classification loss, and for refinement , we should supervised the 100 roi with regression loss with pseudo label, does it right?

what's more, did you still use consistency loss when supervising unlabeled data, or just use the original loss of pvrcnn?

but you say "we do not use the foreground probability to filter pseudo labels", so the "t_cls = 0.2" in "We therefore only additionally filter according to classification confidence with the threshold t_cls = 0.2" is for filtering what?

yezhen17 commented 3 years ago

Supervise the box regression losses (including refinement) and the classification loss. Ignore other losses on unlabeled data. Regarding the number of roi outputs, I just kept the same as pvrcnn.

Now,I think I understand what you say. for rpn output, we should only supervise the proposal meeting the foreground situation(iou with pseudo label > some setting in pvrcnn) with regression loss and classification loss, and for refinement , we should supervised the 100 roi with regression loss with pseudo label, does it right?

what's more, did you still use consistency loss when supervising unlabeled data, or just use the original loss of pvrcnn?

but you say "we do not use the foreground probability to filter pseudo labels", so the "t_cls = 0.2" in "We therefore only additionally filter according to classification confidence with the threshold t_cls = 0.2" is for filtering what?

What do you mean by consistency loss? We do not use consistency loss. All the losses (in form) are the same as labeled data.

"for rpn output, we should only supervise the proposal meeting the foreground situation(iou with pseudo label > some setting in pvrcnn) with regression loss and classification loss, and for refinement , we should supervised the 100 roi with regression loss with pseudo label, does it right?" You're right.

t_cls = 0.2 is for filtering pseudo-labels. We take the semantic classification probability of the psuedo labels from the rpn stage.

AndyYuan96 commented 3 years ago

for 100% unlabeled data config， you use 100% trainset as labeled data， and 100% trainset as unlabeled data。 So pesudo label‘s quality should lower than gt for trainset，does it right？so the upper bound occurs if I use gt label as pesudo label？so it looks like there are no different with supervised training with labeled dataset。 I do an experiment on kitti with pvrcnn，which is use the filtered output of ema model as pesudo label， and I only do selectively supervision for rpn’s output， the result is lower than supervised learning，and the loss is not normal for rpn loss for pesudo label，which don't always decrease， the unlabeled data loss curve first go up and then go down，which is a reverse U shape。by the way，I use 8 v100，and batchsize is 48(each gpu has 3 labeled data and 3 unlabeled data)，I will try to figure out why。I train one model with 80 epoch， and the result doesn't have too much difference with supervised learning， and when training epoch is 160， the AP of cyclist and pedestrian drop a lot。

hughw19 commented 3 years ago

I see your point. It is obvious that the pesudo labels are more erroneous than gt labels, however, it is not always true that training using the pseudo labels will lead to inferior results. You can see the same effect in SESS that using both GT and pseudo labels leads to better performance. We think this is similar to knowledge distillation effects. Please take a look at this CVPR 2020 paper, Revisiting Knowledge Distillation via Label Smoothing Regularization. https://openaccess.thecvf.com/content_CVPR_2020/papers/Yuan_Revisiting_Knowledge_Distillation_via_Label_Smoothing_Regularization_CVPR_2020_paper.pdf

yezhen17 commented 3 years ago

for 100% unlabeled data config， you use 100% trainset as labeled data， and 100% trainset as unlabeled data。 So pesudo label‘s quality should lower than gt for trainset，does it right？so the upper bound occurs if I use gt label as pesudo label？so it looks like there are no different with supervised training with labeled dataset。 I do an experiment on kitti with pvrcnn，which is use the filtered output of ema model as pesudo label， and I only do selectively supervision for rpn’s output， the result is lower than supervised learning，and the loss is not normal for rpn loss for pesudo label，which don't always decrease， the unlabeled data loss curve first go up and then go down，which is a reverse U shape。by the way，I use 8 v100，and batchsize is 48(each gpu has 3 labeled data and 3 unlabeled data)，I will try to figure out why。I train one model with 80 epoch， and the result doesn't have too much difference with supervised learning， and when training epoch is 160， the AP of cyclist and pedestrian drop a lot。

One of your questions should be answered by @hughw19 , and to your other issue, do you use a pre-trained model for 100% labeled data? We selectively supervise semantic classification loss, bounding box loss and bounding box regression loss. We used 8 K80s, batch size is 8*2 (each gpu has 1 labeled data and 1 unlabeled data), and we trained 40 epochs.

Part of the config:

OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 1  # means the labeled sample num; unlabeled sample num = labeled sample num
    EVAL_BATCH_SIZE_PER_GPU: 8
    NUM_EPOCHS: 40

    OPTIMIZER: adam_onecycle
    LR: 0.01
    WEIGHT_DECAY: 0.01
    MOMENTUM: 0.9

    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [16, 24]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001

    LR_WARMUP: False
    WARMUP_EPOCH: 1

    GRAD_NORM_CLIP: 10

AndyYuan96 commented 3 years ago

for 100% unlabeled data config， you use 100% trainset as labeled data， and 100% trainset as unlabeled data。 So pesudo label‘s quality should lower than gt for trainset，does it right？so the upper bound occurs if I use gt label as pesudo label？so it looks like there are no different with supervised training with labeled dataset。 I do an experiment on kitti with pvrcnn，which is use the filtered output of ema model as pesudo label， and I only do selectively supervision for rpn’s output， the result is lower than supervised learning，and the loss is not normal for rpn loss for pesudo label，which don't always decrease， the unlabeled data loss curve first go up and then go down，which is a reverse U shape。by the way，I use 8 v100，and batchsize is 48(each gpu has 3 labeled data and 3 unlabeled data)，I will try to figure out why。I train one model with 80 epoch， and the result doesn't have too much difference with supervised learning， and when training epoch is 160， the AP of cyclist and pedestrian drop a lot。

One of your questions should be answered by @hughw19 , and to your other issue, do you use a pre-trained model for 100% labeled data? We selectively supervise semantic classification loss, bounding box loss and bounding box regression loss. We used 8 K80s, batch size is 8*2 (each gpu has 1 labeled data and 1 unlabeled data), and we trained 40 epochs.

Part of the config:
OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 1  # means the labeled sample num; unlabeled sample num = labeled sample num
    EVAL_BATCH_SIZE_PER_GPU: 8
    NUM_EPOCHS: 40

    OPTIMIZER: adam_onecycle
    LR: 0.01
    WEIGHT_DECAY: 0.01
    MOMENTUM: 0.9

    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [16, 24]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001

    LR_WARMUP: False
    WARMUP_EPOCH: 1

    GRAD_NORM_CLIP: 10

thank you for reply， I do use pretrained model.I first use trainset to train pvrcnn using default config 8 v100(batchsize=8*8), and the result is normal. I will try this config when I return, I have something to do this week.

AndyYuan96 commented 3 years ago

I see your point. It is obvious that the pesudo labels are more erroneous than gt labels, however, it is not always true that training using the pseudo labels will lead to inferior results. You can see the same effect in SESS that using both GT and pseudo labels leads to better performance. We think this is similar to knowledge distillation effects. Please take a look at this CVPR 2020 paper, Revisiting Knowledge Distillation via Label Smoothing Regularization. https://openaccess.thecvf.com/content_CVPR_2020/papers/Yuan_Revisiting_Knowledge_Distillation_via_Label_Smoothing_Regularization_CVPR_2020_paper.pdf

thank you, I will try to understand the paper.

baraujo98 commented 3 years ago

@THU17cyz can I ask you if by any chance a code release this month is a possible scenario or not? Just to know if I should invest time now in doing the adaptation to KITTI myself. Thanks!

yezhen17 commented 3 years ago

@THU17cyz can I ask you if by any chance a code release this month is a possible scenario or not? Just to know if I should invest time now in doing the adaptation to KITTI myself. Thanks!

Hi, I think it's possible : )

baraujo98 commented 3 years ago

That's great! :+1:

baraujo98 commented 3 years ago

Hi! Sorry to bother @THU17cyz, but just out of curiosity, do you have an ETA to release the code?

yezhen17 commented 3 years ago

Hi! Sorry to bother @THU17cyz, but just out of curiosity, do you have an ETA to release the code?

Hi @baraujo98, can you send an e-mail to me and I can give you an unofficial version.

yezhen17 / 3DIoUMatch

some detail of kitti implementation #5