QiyaoWei commented 4 years ago

Dear author,

I make sure I have installed detectron2==0.1.0 properly (apply_net works with pretrained models). However, when I run train net I get the following error:

Command: python train_net.py --config-file configs/densepose_parsing_rcnn_spnasnet_100_FPN_s3x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

Traceback (most recent call last): File "train_net.py", line 23, in from densepose.modeling.quantize import quantize_decorate, quantize_prepare File "/root/qiyao/DensePoseFnL/densepose/modeling/quantize.py", line 14, in from detectron2.layers import ShapeSpec, Conv2d, interpolate, NaiveSyncBatchNorm, FrozenBatchNorm2d, Linear ImportError: cannot import name 'Linear'

This looks like a detectron version problem. Did I miss something? Thanks!

zetyquickly commented 4 years ago

Hello, @QiyaoWei Please refer to my answer given previously https://github.com/zetyquickly/DensePoseFnL/issues/3#issuecomment-660464048

favorxin commented 3 years ago

Dear author,

I make sure I have installed detectron2==0.1.0 properly (apply_net works with pretrained models). However, when I run train net I get the following error:

Command: python train_net.py --config-file configs/densepose_parsing_rcnn_spnasnet_100_FPN_s3x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

Traceback (most recent call last): File "train_net.py", line 23, in from densepose.modeling.quantize import quantize_decorate, quantize_prepare File "/root/qiyao/DensePoseFnL/densepose/modeling/quantize.py", line 14, in from detectron2.layers import ShapeSpec, Conv2d, interpolate, NaiveSyncBatchNorm, FrozenBatchNorm2d, Linear ImportError: cannot import name 'Linear'

This looks like a detectron version problem. Did I miss something? Thanks!

Hello! Did you solve this problem? When I installed detectron2-0.1, it appears the problem “cannot import Linear from detectron2.layers”. When I installed detectron2-0.4.1, it appears the problem "DensePoseROIHeads object has no attribute 'feature_strides'". Could you solved this problem?

I tryed densepose_rcnn_R_50_FPN_s1x.yaml and densepose_rcnn_mobilenetv3_rw_FPN_s1x.yaml.

zetyquickly commented 3 years ago

@favorxin look at the readme,

It is needed to install detectron2 from source from a specific commit preceded 0.1.0. They've introduced braking changes in 0.1.0 to our project

favorxin commented 3 years ago

@favorxin look at the readme,

It is needed to install detectron2 from source from a specific commit preceded 0.1.0. They've introduced braking changes in 0.1.0 to our project

Thank you very much! I use the s0_bv2_bifpn_f64.yaml to train the baseline successfully! I'm confused about the quantization training. I used the s0_bv2_bifpn_f64.yaml and --qat to train from scratch and got the following problem. b0fa34dbc696494c6280cdddd88dbf4

The way1: Should I train the s0_bv2_bifpn_f64.yaml to get a baseline? And using the baseline to train the net. The way2: python train_net.py --qat --config-file ./configs/s0_bv2_bifpn_f64.yaml SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025

zetyquickly commented 3 years ago

Hey @favorxin

It's definitely great that you succeeded with s0_bv2_bifpn_f64. Yes, to do quantization aware training it is needed to run baseline training first and your command line in "the way 2" seems correct. Do not forget to include trained weights to the process with MODEL.WEIGHTS

favorxin commented 3 years ago

Hey @favorxin

It's definitely great that you succeeded with s0_bv2_bifpn_f64. Yes, to do quantization aware training it is needed to run baseline training first and your command line in "the way 2" seems correct. Do not forget to include trained weights to the process with MODEL.WEIGHTS

I set the MODEL.WEIGHTS in s0_bvc_bifpn_f64_s3x.yaml config file or set it in shell. Both of them cannot train successfully. python train_net.py --qat --config-file ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 MODEL.WEIGHTS ./output_new/model_final.pth It appears that warning. 6259c7cdf19b2427e8fd375d02766f2

I am confused about how to train the densepose in --qat.

zetyquickly commented 3 years ago

Hey @favorxin

Let me try to help you, as I can seen from the screenshot, the training starts and ends without errors. So:

Maybe it's the max_iter count is too small in here
Also set eval period accordingly

favorxin commented 3 years ago

Hey @favorxin

Let me try to help you, as I can seen from the screenshot, the training starts and ends without errors. So:

Maybe it's the max_iter count is too small in here

Also set eval period accordingly

Thank you for your prompt response! It works. I set the MAX_ITER to 400000.

favorxin commented 3 years ago

Hey @favorxin

Let me try to help you, as I can seen from the screenshot, the training starts and ends without errors. So:

Maybe it's the max_iter count is too small in here

Also set eval period accordingly

Hello! I'm coming again! I have some doubts about the learning rate in the training the qat densepose. I saw the learning rate of baseline training(s0_bv2_bifpn_f64) went from 0.0025 -> 0.00025(100000 iteration) -> 0.000025(120000 iteration). It seems that the STEPS set (100000, 120000) which is reducing learning rate.

But when I training the qat densepose, the learning rate doesn't follow the STEPS changing. The begining learning rate is 0.0025 and it changes in the warm up phase, but it doesn't change(0.0025) in the following phase. I set the STEPS and MAX_ITER as following.

How can I finetune the qat densepose about changing the learning rate to get a good performance? The current baseline(s0_bv2_bifpn_f64) result is very good. But the qat result(s0_bv2_bifpn_f64_s3x) cannot has any performance.

zetyquickly commented 3 years ago

Hey @favorxin

It interesting question, I don't remember to adjust these parameters to achieve results described in the paper . Everything is described there is done by qat fine tuning with some additional training steps without changing the learning rate.

favorxin commented 3 years ago

Hey @favorxin

It interesting question, I don't remember to adjust these parameters to achieve results described in the paper . Everything is described there is done by qat fine tuning with some additional training steps without changing the learning rate.

So loading the trained model (using s0_bv2_bifpn_f64 config file) and using the s0_bv2_bifpn_f64_s3x config file can get the final result? Just adjusting the STEPS to larger nums like 400000. python train_net.py --qat ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 1 MODEL.WEIGHTS ./output_new/model_final.pth

The above methond I tryed cannot get a good performance. Does it can try qat from scratch (don't load trained model)? python train_net.py --qat ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 1

zetyquickly commented 3 years ago

I don't remember exact number of additional steps, but yes fine tune the float weights and you get the results. You should check AP metrics rather then qualitative results on images

And the second, for me only fine tuning worked, qat from scratch didn't give any good results

favorxin commented 3 years ago

I don't remember exact number of additional steps, but yes fine tune the float weights and you get the results. You should check AP metrics rather then qualitative results on images

And the second, for me only fine tuning worked, qat from scratch didn't give any good results

I trained the ./configs/s0_bv2_bifpn_f64_s3x.yaml and got the model_final.pth which has a good performance. But when I load the final model and --qat the model, it went a bad performance.

`

train the baseline s0_bv2_bifpn_f64_s3x.yaml

python train_net.py ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 2

qat the model_final.pth got from above

python train_net.py --qat ./configs/s0_bv2_bifpn_f64_s3x_ft.yaml SOLVER.IMS_PER_BATCH 1 MODEL.WEIGHTS ./output_s3x_1/model_final.pth

inference one image

python apply_net.py show ./configs/s0_bv2_bifpn_f64_s3x_ft.yaml ./output_s3x_qat/model_final.pth ./people.jpg dp_contour --output ./saved_1.jpg

`

The aboved steps were implemented step by step. Does it right?

zetyquickly commented 3 years ago

@favorxin

What you did seems legit. Could you please share quantitative results of evaluation step with APs after the qat? Let's check the numbers there the same I saw during the training.

zetyquickly commented 3 years ago

And if it's possible show the APs table for non-quantized model

favorxin commented 3 years ago

And if it's possible show the APs table for non-quantized model

Here is the non-quantized model's final eval performance. d1d4e9d8cb784bca36defe5d2b076be

The second pic is the quantitative results.

favorxin commented 3 years ago

@favorxin

What you did seems legit. Could you please share quantitative results of evaluation step with APs after the qat? Let's check the numbers there the same I saw during the training.

When I use --quant-eval, it went wrong. I run the following code. python train_net.py --quant-eval --config-file ./configs/s0_bv2_bifpn_f64.yaml MODEL.WEIGHTS ./output_f64_qat/model_final.pth SOLVER.IMS_PER_BATCH 1

I guess that my pytorch install went wrong? But I can train the model s0_bv2_bifpn_f64_s3x.yaml.

Could I send my final trained and quantized model to you by the following email: emilzq@bk.ru?

zetyquickly commented 3 years ago

And if it's possible show the APs table for non-quantized model

Here is the non-quantized model's final eval performance.

The second pic is the quantitative results.

Regarding this, from my old experiments I found the following results:

Non-quantized inference on CPU

[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 |

[03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose: 
|   AP   |  AP50  |  AP75  |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|
| 51.076 | 86.149 | 53.406 | 44.867 | 52.244 |

Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices)
Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)

and

Quantized model on CPU

[03/27 13:18:01 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 55.287 | 83.118 | 57.912 | 23.806 | 52.083 | 71.358 |

[03/27 13:24:43 densepose.evaluator]: Evaluation results for densepose: 
|   AP   |  AP50  |  AP75  |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|
| 47.026 | 82.772 | 48.332 | 43.046 | 47.944 |

Total inference time: 0:20:28.000668 (0.817033 s / img per device, on 1 devices)
Total inference pure compute time: 0:09:24 (0.375824 s / img per device, on 1 devices)

Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.

zetyquickly commented 3 years ago

@favorxin What you did seems legit. Could you please share quantitative results of evaluation step with APs after the qat? Let's check the numbers there the same I saw during the training.

When I use --quant-eval, it went wrong. I run the following code. python train_net.py --quant-eval --config-file ./configs/s0_bv2_bifpn_f64.yaml MODEL.WEIGHTS ./output_f64_qat/model_final.pth SOLVER.IMS_PER_BATCH 1

I guess that my pytorch install went wrong? But I can train the model s0_bv2_bifpn_f64_s3x.yaml.

Could I send my final trained and quantized model to you by the following email: emilzq@bk.ru?

Regarding this, I think pytorch version should be the same as we used during research. Every new version of pytorch (1.5, 1.6, 1.7) had introduced breaking changes, may be it is also in your case. And I never saw this RuntimeError.

To sum up, please try to fit your non-quant model more

favorxin commented 3 years ago

@favorxin What you did seems legit. Could you please share quantitative results of evaluation step with APs after the qat? Let's check the numbers there the same I saw during the training.

When I use --quant-eval, it went wrong. I run the following code. python train_net.py --quant-eval --config-file ./configs/s0_bv2_bifpn_f64.yaml MODEL.WEIGHTS ./output_f64_qat/model_final.pth SOLVER.IMS_PER_BATCH 1 I guess that my pytorch install went wrong? But I can train the model s0_bv2_bifpn_f64_s3x.yaml. Could I send my final trained and quantized model to you by the following email: emilzq@bk.ru?

Regarding this, I think pytorch version should be the same as we used during research. Every new version of pytorch (1.5, 1.6, 1.7) had introduced breaking changes, may be it is also in your case. And I never saw this RuntimeError.

To sum up, please try to fit your non-quant model more

Thank you very much! I will finetune the model to attach the performance as you shown.

favorxin commented 3 years ago

Regarding this, from my old experiments I found the following results:

Non-quantized inference on CPU

[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 |

[03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose: 
|   AP   |  AP50  |  AP75  |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|
| 51.076 | 86.149 | 53.406 | 44.867 | 52.244 |

Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices)
Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)

Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.

Hello! I want to train the non-quantized model. But I am confused about the result. I saw the paper training iteration is 130K and the batch-size is 16 using 8 gpus. Can I set the MAX_ITER to 1200K and the batch-size is 2 to train the final model? Here is my result.

ba16a10a486dc781be01ae051504437

zetyquickly commented 2 years ago

Regarding this, from my old experiments I found the following results: Non-quantized inference on CPU
[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 |

[03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose: 
|   AP   |  AP50  |  AP75  |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|
| 51.076 | 86.149 | 53.406 | 44.867 | 52.244 |

Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices)
Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)
Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.
Hello! I want to train the non-quantized model. But I am confused about the result. I saw the paper training iteration is 130K and the batch-size is 16 using 8 gpus. Can I set the MAX_ITER to 1200K and the batch-size is 2 to train the final model? Here is my result.

Hello,

I think so, yes you could. But it's needed to adjust learning rate (decrease by factor of 8 or 4 in your case) and learning rate decrease schedule (that thing check by yourself, I don't know). And also maybe you'll need more then 1200K iterations to converge.

favorxin commented 2 years ago

Regarding this, from my old experiments I found the following results: Non-quantized inference on CPU
[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 |

[03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose: 
|   AP   |  AP50  |  AP75  |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|
| 51.076 | 86.149 | 53.406 | 44.867 | 52.244 |

Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices)
Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)
Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.
Hello! I want to train the non-quantized model. But I am confused about the result. I saw the paper training iteration is 130K and the batch-size is 16 using 8 gpus. Can I set the MAX_ITER to 1200K and the batch-size is 2 to train the final model? Here is my result.
Hello,

I think so, yes you could. But it's needed to adjust learning rate (decrease by factor of 8 or 4 in your case) and learning rate decrease schedule (that thing check by yourself, I don't know). And also maybe you'll need more then 1200K iterations to converge.

Hello! I am trying to attach your perfect preformance. I cannot got the preformance when I use 4 TeslaT4 GPU and use the following code. python train_net.py --config-file ./configs/s0_bv2_bifpn_f64_s3x.yaml --num-gpus 4 SOLVER.IMS_PER_BATCH 4 The config file is as following: _BASE_: "s0_bv2_bifpn.yaml" MODEL: FPN: OUT_CHANNELS: 64 ROI_BOX_HEAD: CONV_DIM: 64 ROI_SHARED_BLOCK: ASPP_DIM: 64 CONV_HEAD_DIM: 64 SOLVER: MAX_ITER: 390000 STEPS: (330000, 370000) TEST: EVAL_PERIOD: 390000

Would you has some advices to got the training result. Here is my result: Training: c3c28bed09005512e121e405312d83b val: 1d087fdf7ac232a6591b3dff90799f0

zetyquickly commented 2 years ago

Hello @favorxin I don't see any particular problem. Why did you choose SOLVER.IMS_PER_BATCH 4, does it fit 8?

@rakhimovv please have a look

rakhimovv commented 2 years ago

@favorxin the main problem I see is the reduced total batch size (SOLVER.IMS_PER_BATCH 4). The less batch the worse the performance due to the presence of BatchNorm layers. So I guess it would be impossible to reproduce exactly the same metrics in this scenario. I suggest checking the official detectron2 repo to look for advice on how to train two-stage models when you have fewer GPUs/memory.

I guess the answer would be smth like playing with learning rate, number of iterations, trying to add PreciseBN. But can't say for sure

favorxin commented 2 years ago

PUs/memory.

Thank you for replaying! I use 4 TeslaT4 GPU to train and set the SOLVER.IMS_PER_BATCH to 16. But I can not get the more better performance. I trained the 16 SOLVER.IMS_PER_BATCH and finally got 14.7 AP which is too low today. I saw the total loss is 1.98*** which cannot reduce at the final iterations. Would you help me to check the training error. If you need the trained model or the training log. I can send it to you.

@favorxin the main problem I see is the reduced total batch size (SOLVER.IMS_PER_BATCH 4). The less batch the worse the performance due to the presence of BatchNorm layers. So I guess it would be impossible to reproduce exactly the same metrics in this scenario. I suggest checking the official detectron2 repo to look for advice on how to train two-stage models when you have fewer GPUs/memory.

I guess the answer would be smth like playing with learning rate, number of iterations, trying to add PreciseBN. But can't say for sure

Thank you for replaying! I use 4 TeslaT4 GPU to train and set the SOLVER.IMS_PER_BATCH to 16. But I can not get the more better performance. I trained the 16 SOLVER.IMS_PER_BATCH and finally got 14.7 AP which is too low today. I saw the total loss is 1.98*** which cannot reduce at the final iterations. Would you help me to check the training error. If you need the trained model or the training log. I can send it to you.

zetyquickly commented 2 years ago

Hello @favorxin,

I am really appreciate your willingness to reproduce our results, this is the thing we are mutually interested in.

As for me, there are two options to solve this issue 1) to continue investing efforts on your end, 2) or to redo the experiment on our end. For now, both of us @rakhimovv and me are hardly able to redo the experiment to provide more precise instructions for you, due to our huge backlog.

Please consider that we do have trained weights that perform as the best s0_bv2_bifpn_f64_s3x model presented in our paper, so your effort is not meaningless, but we unable share it.

favorxin commented 2 years ago

Sorry for late reply~ Thank you for your advices! I am trying to get the densepose more fast. I want to got 30 FPS in 3080ti or 3090. So I am very interested in reproduce your meaningful work. I can imagine that you have paid a lot for the excellent performance. I don't need the best performance but a snapshot in the intermediate results. I will continue to reproduce the meaningful results!

zetyquickly commented 2 years ago

@favorxin please try new version (new branch), so we can close this issue

zetyquickly / DensePoseFnL

train_net.py issues #5

train the baseline s0_bv2_bifpn_f64_s3x.yaml

qat the model_final.pth got from above

inference one image