Closed QiyaoWei closed 2 years ago
Hello, @QiyaoWei Please refer to my answer given previously https://github.com/zetyquickly/DensePoseFnL/issues/3#issuecomment-660464048
Dear author,
I make sure I have installed detectron2==0.1.0 properly (apply_net works with pretrained models). However, when I run train net I get the following error:
Command: python train_net.py --config-file configs/densepose_parsing_rcnn_spnasnet_100_FPN_s3x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025
Traceback (most recent call last): File "train_net.py", line 23, in from densepose.modeling.quantize import quantize_decorate, quantize_prepare File "/root/qiyao/DensePoseFnL/densepose/modeling/quantize.py", line 14, in from detectron2.layers import ShapeSpec, Conv2d, interpolate, NaiveSyncBatchNorm, FrozenBatchNorm2d, Linear ImportError: cannot import name 'Linear'
This looks like a detectron version problem. Did I miss something? Thanks!
Hello! Did you solve this problem? When I installed detectron2-0.1, it appears the problem “cannot import Linear from detectron2.layers”. When I installed detectron2-0.4.1, it appears the problem "DensePoseROIHeads object has no attribute 'feature_strides'". Could you solved this problem?
I tryed densepose_rcnn_R_50_FPN_s1x.yaml and densepose_rcnn_mobilenetv3_rw_FPN_s1x.yaml.
@favorxin look at the readme,
It is needed to install detectron2 from source from a specific commit preceded 0.1.0. They've introduced braking changes in 0.1.0 to our project
@favorxin look at the readme,
It is needed to install detectron2 from source from a specific commit preceded 0.1.0. They've introduced braking changes in 0.1.0 to our project
Thank you very much! I use the s0_bv2_bifpn_f64.yaml to train the baseline successfully! I'm confused about the quantization training. I used the s0_bv2_bifpn_f64.yaml and --qat to train from scratch and got the following problem.
The way1: Should I train the s0_bv2_bifpn_f64.yaml to get a baseline? And using the baseline to train the net.
The way2: python train_net.py --qat --config-file ./configs/s0_bv2_bifpn_f64.yaml SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025
Hey @favorxin
It's definitely great that you succeeded with s0_bv2_bifpn_f64
. Yes, to do quantization aware training it is needed to run baseline training first and your command line in "the way 2" seems correct. Do not forget to include trained weights to the process with MODEL.WEIGHTS
Hey @favorxin
It's definitely great that you succeeded with
s0_bv2_bifpn_f64
. Yes, to do quantization aware training it is needed to run baseline training first and your command line in "the way 2" seems correct. Do not forget to include trained weights to the process withMODEL.WEIGHTS
I set the MODEL.WEIGHTS in s0_bvc_bifpn_f64_s3x.yaml config file or set it in shell. Both of them cannot train successfully.
python train_net.py --qat --config-file ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 MODEL.WEIGHTS ./output_new/model_final.pth
It appears that warning.
I am confused about how to train the densepose in --qat.
Hey @favorxin
Let me try to help you, as I can seen from the screenshot, the training starts and ends without errors. So:
Hey @favorxin
Let me try to help you, as I can seen from the screenshot, the training starts and ends without errors. So:
- Maybe it's the max_iter count is too small in here
- Also set eval period accordingly
Thank you for your prompt response! It works. I set the MAX_ITER to 400000.
Hey @favorxin
Let me try to help you, as I can seen from the screenshot, the training starts and ends without errors. So:
- Maybe it's the max_iter count is too small in here
- Also set eval period accordingly
Hello! I'm coming again! I have some doubts about the learning rate in the training the qat densepose. I saw the learning rate of baseline training(s0_bv2_bifpn_f64) went from 0.0025 -> 0.00025(100000 iteration) -> 0.000025(120000 iteration). It seems that the STEPS set (100000, 120000) which is reducing learning rate.
But when I training the qat densepose, the learning rate doesn't follow the STEPS changing. The begining learning rate is 0.0025 and it changes in the warm up phase, but it doesn't change(0.0025) in the following phase. I set the STEPS and MAX_ITER as following.
How can I finetune the qat densepose about changing the learning rate to get a good performance? The current baseline(s0_bv2_bifpn_f64) result is very good. But the qat result(s0_bv2_bifpn_f64_s3x) cannot has any performance.
Hey @favorxin
It interesting question, I don't remember to adjust these parameters to achieve results described in the paper . Everything is described there is done by qat
fine tuning with some additional training steps without changing the learning rate.
Hey @favorxin
It interesting question, I don't remember to adjust these parameters to achieve results described in the paper . Everything is described there is done by
qat
fine tuning with some additional training steps without changing the learning rate.
So loading the trained model (using s0_bv2_bifpn_f64 config file) and using the s0_bv2_bifpn_f64_s3x config file can get the final result? Just adjusting the STEPS to larger nums like 400000.
python train_net.py --qat ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 1 MODEL.WEIGHTS ./output_new/model_final.pth
The above methond I tryed cannot get a good performance. Does it can try qat from scratch (don't load trained model)?
python train_net.py --qat ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 1
I don't remember exact number of additional steps, but yes fine tune the float weights and you get the results. You should check AP metrics rather then qualitative results on images
And the second, for me only fine tuning worked, qat
from scratch didn't give any good results
I don't remember exact number of additional steps, but yes fine tune the float weights and you get the results. You should check AP metrics rather then qualitative results on images
And the second, for me only fine tuning worked,
qat
from scratch didn't give any good results
I trained the ./configs/s0_bv2_bifpn_f64_s3x.yaml and got the model_final.pth which has a good performance. But when I load the final model and --qat the model, it went a bad performance.
`
python train_net.py ./configs/s0_bv2_bifpn_f64_s3x.yaml SOLVER.IMS_PER_BATCH 2
python train_net.py --qat ./configs/s0_bv2_bifpn_f64_s3x_ft.yaml SOLVER.IMS_PER_BATCH 1 MODEL.WEIGHTS ./output_s3x_1/model_final.pth
python apply_net.py show ./configs/s0_bv2_bifpn_f64_s3x_ft.yaml ./output_s3x_qat/model_final.pth ./people.jpg dp_contour --output ./saved_1.jpg
`
The aboved steps were implemented step by step. Does it right?
@favorxin
What you did seems legit. Could you please share quantitative results of evaluation step with APs after the qat
? Let's check the numbers there the same I saw during the training.
And if it's possible show the APs table for non-quantized model
And if it's possible show the APs table for non-quantized model
Here is the non-quantized model's final eval performance.
The second pic is the quantitative results.
@favorxin
What you did seems legit. Could you please share quantitative results of evaluation step with APs after the
qat
? Let's check the numbers there the same I saw during the training.
When I use --quant-eval, it went wrong. I run the following code.
python train_net.py --quant-eval --config-file ./configs/s0_bv2_bifpn_f64.yaml MODEL.WEIGHTS ./output_f64_qat/model_final.pth SOLVER.IMS_PER_BATCH 1
I guess that my pytorch install went wrong? But I can train the model s0_bv2_bifpn_f64_s3x.yaml.
Could I send my final trained and quantized model to you by the following email: emilzq@bk.ru?
And if it's possible show the APs table for non-quantized model
Here is the non-quantized model's final eval performance.
The second pic is the quantitative results.
Regarding this, from my old experiments I found the following results:
Non-quantized inference on CPU
[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox:
| AP | AP50 | AP75 | APs | APm | APl |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 |
[03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose:
| AP | AP50 | AP75 | APm | APl |
|:------:|:------:|:------:|:------:|:------:|
| 51.076 | 86.149 | 53.406 | 44.867 | 52.244 |
Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices)
Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)
and
Quantized model on CPU
[03/27 13:18:01 d2.evaluation.coco_evaluation]: Evaluation results for bbox:
| AP | AP50 | AP75 | APs | APm | APl |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 55.287 | 83.118 | 57.912 | 23.806 | 52.083 | 71.358 |
[03/27 13:24:43 densepose.evaluator]: Evaluation results for densepose:
| AP | AP50 | AP75 | APm | APl |
|:------:|:------:|:------:|:------:|:------:|
| 47.026 | 82.772 | 48.332 | 43.046 | 47.944 |
Total inference time: 0:20:28.000668 (0.817033 s / img per device, on 1 devices)
Total inference pure compute time: 0:09:24 (0.375824 s / img per device, on 1 devices)
Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.
@favorxin What you did seems legit. Could you please share quantitative results of evaluation step with APs after the
qat
? Let's check the numbers there the same I saw during the training.When I use --quant-eval, it went wrong. I run the following code.
python train_net.py --quant-eval --config-file ./configs/s0_bv2_bifpn_f64.yaml MODEL.WEIGHTS ./output_f64_qat/model_final.pth SOLVER.IMS_PER_BATCH 1
I guess that my pytorch install went wrong? But I can train the model s0_bv2_bifpn_f64_s3x.yaml.
Could I send my final trained and quantized model to you by the following email: emilzq@bk.ru?
Regarding this, I think pytorch version should be the same as we used during research. Every new version of pytorch (1.5, 1.6, 1.7) had introduced breaking changes, may be it is also in your case. And I never saw this RuntimeError.
To sum up, please try to fit your non-quant model more
@favorxin What you did seems legit. Could you please share quantitative results of evaluation step with APs after the
qat
? Let's check the numbers there the same I saw during the training.When I use --quant-eval, it went wrong. I run the following code.
python train_net.py --quant-eval --config-file ./configs/s0_bv2_bifpn_f64.yaml MODEL.WEIGHTS ./output_f64_qat/model_final.pth SOLVER.IMS_PER_BATCH 1
I guess that my pytorch install went wrong? But I can train the model s0_bv2_bifpn_f64_s3x.yaml. Could I send my final trained and quantized model to you by the following email: emilzq@bk.ru?Regarding this, I think pytorch version should be the same as we used during research. Every new version of pytorch (1.5, 1.6, 1.7) had introduced breaking changes, may be it is also in your case. And I never saw this RuntimeError.
To sum up, please try to fit your non-quant model more
Thank you very much! I will finetune the model to attach the performance as you shown.
Regarding this, from my old experiments I found the following results:
Non-quantized inference on CPU
[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: | AP | AP50 | AP75 | APs | APm | APl | |:------:|:------:|:------:|:------:|:------:|:------:| | 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 | [03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose: | AP | AP50 | AP75 | APm | APl | |:------:|:------:|:------:|:------:|:------:| | 51.076 | 86.149 | 53.406 | 44.867 | 52.244 | Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices) Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)
Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.
Hello! I want to train the non-quantized model. But I am confused about the result. I saw the paper training iteration is 130K and the batch-size is 16 using 8 gpus. Can I set the MAX_ITER to 1200K and the batch-size is 2 to train the final model? Here is my result.
Regarding this, from my old experiments I found the following results: Non-quantized inference on CPU
[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: | AP | AP50 | AP75 | APs | APm | APl | |:------:|:------:|:------:|:------:|:------:|:------:| | 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 | [03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose: | AP | AP50 | AP75 | APm | APl | |:------:|:------:|:------:|:------:|:------:| | 51.076 | 86.149 | 53.406 | 44.867 | 52.244 | Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices) Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)
Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.
Hello! I want to train the non-quantized model. But I am confused about the result. I saw the paper training iteration is 130K and the batch-size is 16 using 8 gpus. Can I set the MAX_ITER to 1200K and the batch-size is 2 to train the final model? Here is my result.
Hello,
I think so, yes you could. But it's needed to adjust learning rate (decrease by factor of 8 or 4 in your case) and learning rate decrease schedule (that thing check by yourself, I don't know). And also maybe you'll need more then 1200K iterations to converge.
Regarding this, from my old experiments I found the following results: Non-quantized inference on CPU
[03/27 13:57:02 d2.evaluation.coco_evaluation]: Evaluation results for bbox: | AP | AP50 | AP75 | APs | APm | APl | |:------:|:------:|:------:|:------:|:------:|:------:| | 56.826 | 84.683 | 59.557 | 25.569 | 53.329 | 73.181 | [03/27 14:03:50 densepose.evaluator]: Evaluation results for densepose: | AP | AP50 | AP75 | APm | APl | |:------:|:------:|:------:|:------:|:------:| | 51.076 | 86.149 | 53.406 | 44.867 | 52.244 | Total inference time: 0:29:03.904127 (1.160282 s / img per device, on 1 devices) Total inference pure compute time: 0:17:38 (0.703991 s / img per device, on 1 devices)
Seems like your model training haven't converged. And it's not surprise that quantized model with DP AP of 0.020 shows such bad results.
Hello! I want to train the non-quantized model. But I am confused about the result. I saw the paper training iteration is 130K and the batch-size is 16 using 8 gpus. Can I set the MAX_ITER to 1200K and the batch-size is 2 to train the final model? Here is my result.
Hello,
I think so, yes you could. But it's needed to adjust learning rate (decrease by factor of 8 or 4 in your case) and learning rate decrease schedule (that thing check by yourself, I don't know). And also maybe you'll need more then 1200K iterations to converge.
Hello! I am trying to attach your perfect preformance. I cannot got the preformance when I use 4 TeslaT4 GPU and use the following code.
python train_net.py --config-file ./configs/s0_bv2_bifpn_f64_s3x.yaml --num-gpus 4 SOLVER.IMS_PER_BATCH 4
The config file is as following:
_BASE_: "s0_bv2_bifpn.yaml" MODEL: FPN: OUT_CHANNELS: 64 ROI_BOX_HEAD: CONV_DIM: 64 ROI_SHARED_BLOCK: ASPP_DIM: 64 CONV_HEAD_DIM: 64 SOLVER: MAX_ITER: 390000 STEPS: (330000, 370000) TEST: EVAL_PERIOD: 390000
Would you has some advices to got the training result. Here is my result: Training: val:
Hello @favorxin
I don't see any particular problem. Why did you choose SOLVER.IMS_PER_BATCH 4
, does it fit 8
?
@rakhimovv please have a look
@favorxin the main problem I see is the reduced total batch size (SOLVER.IMS_PER_BATCH 4
). The less batch the worse the performance due to the presence of BatchNorm layers. So I guess it would be impossible to reproduce exactly the same metrics in this scenario. I suggest checking the official detectron2 repo to look for advice on how to train two-stage models when you have fewer GPUs/memory.
I guess the answer would be smth like playing with learning rate, number of iterations, trying to add PreciseBN. But can't say for sure
PUs/memory.
Thank you for replaying! I use 4 TeslaT4 GPU to train and set the SOLVER.IMS_PER_BATCH to 16. But I can not get the more better performance. I trained the 16 SOLVER.IMS_PER_BATCH and finally got 14.7 AP which is too low today. I saw the total loss is 1.98*** which cannot reduce at the final iterations. Would you help me to check the training error. If you need the trained model or the training log. I can send it to you.
@favorxin the main problem I see is the reduced total batch size (
SOLVER.IMS_PER_BATCH 4
). The less batch the worse the performance due to the presence of BatchNorm layers. So I guess it would be impossible to reproduce exactly the same metrics in this scenario. I suggest checking the official detectron2 repo to look for advice on how to train two-stage models when you have fewer GPUs/memory.I guess the answer would be smth like playing with learning rate, number of iterations, trying to add PreciseBN. But can't say for sure
Thank you for replaying! I use 4 TeslaT4 GPU to train and set the SOLVER.IMS_PER_BATCH to 16. But I can not get the more better performance. I trained the 16 SOLVER.IMS_PER_BATCH and finally got 14.7 AP which is too low today. I saw the total loss is 1.98*** which cannot reduce at the final iterations. Would you help me to check the training error. If you need the trained model or the training log. I can send it to you.
Hello @favorxin,
I am really appreciate your willingness to reproduce our results, this is the thing we are mutually interested in.
As for me, there are two options to solve this issue 1) to continue investing efforts on your end, 2) or to redo the experiment on our end. For now, both of us @rakhimovv and me are hardly able to redo the experiment to provide more precise instructions for you, due to our huge backlog.
Please consider that we do have trained weights that perform as the best s0_bv2_bifpn_f64_s3x
model presented in our paper, so your effort is not meaningless, but we unable share it.
Sorry for late reply~ Thank you for your advices! I am trying to get the densepose more fast. I want to got 30 FPS in 3080ti or 3090. So I am very interested in reproduce your meaningful work. I can imagine that you have paid a lot for the excellent performance. I don't need the best performance but a snapshot in the intermediate results. I will continue to reproduce the meaningful results!
@favorxin please try new version (new branch), so we can close this issue
Dear author,
I make sure I have installed detectron2==0.1.0 properly (apply_net works with pretrained models). However, when I run train net I get the following error:
Command: python train_net.py --config-file configs/densepose_parsing_rcnn_spnasnet_100_FPN_s3x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025
Traceback (most recent call last): File "train_net.py", line 23, in
from densepose.modeling.quantize import quantize_decorate, quantize_prepare
File "/root/qiyao/DensePoseFnL/densepose/modeling/quantize.py", line 14, in
from detectron2.layers import ShapeSpec, Conv2d, interpolate, NaiveSyncBatchNorm, FrozenBatchNorm2d, Linear
ImportError: cannot import name 'Linear'
This looks like a detectron version problem. Did I miss something? Thanks!