yoshitomo-matsubara / torchdistill

A coding-free framework built on PyTorch for reproducible deep learning studies. šŸ†25 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. šŸŽ Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.
https://yoshitomo-matsubara.net/torchdistill/
MIT License
1.39k stars 131 forks source link

How to run my own dataset using the object detection example? #123

Closed Coldfire93 closed 2 years ago

Coldfire93 commented 3 years ago

Hi, I want to do the experiment using other dataset such as VOC dataset. What should I do before executing the examples/object_detection.py script? I converted the VOC annotation to COCO format. And I modified the yaml configuration file(figure 1) , but got the error in the second figure. Could you please tell me the reason? Thank you!

figure 1:

figure1_conf

figure2: figure2_error_msg

yoshitomo-matsubara commented 3 years ago

Hi @Coldfire93 , If your format is compatible with that of COCO dataset used in torchvision, I think you can use my example code as is.

Can you confirm that your converted VOC files work with their reference code ? The KeyError: '2' in [self.imgs[id] for id in ids] looks like your dataset instance misses the image file paths.

Also another question; how does voc_collate_fn look? I think it's not from my repo. Since the error occurs in DataLoader, it might be caused by the collate function as well.

Coldfire93 commented 3 years ago

Hi @yoshitomo-matsubara , Thanks for your answer. I have confirmed that my converted VOC files works ok with the script you mentioned [https://github.com/pytorch/vision/tree/master/references/detection]

figure 3: figure3_error_msg

Coldfire93 commented 3 years ago

Hi @yoshitomo-matsubara , Thanks for your answer. I have confirmed that my converted VOC files works ok with the script you mentioned [https://github.com/pytorch/vision/tree/master/references/detection]

  • And I changed the "voc_collate_fn" to "coco_collate_fn".
  • But I still got an error (figure 3). Could you please tell me the reason?
  • I have another question. How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!

figure 3: figure3_error_msg

It seems that the problem is produced because I didn't resize the image. I will try to use the "transforms_params" defined in the yaml file to do the resize operation. Another question, can I set the "collate_fn" in the yaml file to None instead of "coco_collate_fn"? Thank you~

yoshitomo-matsubara commented 3 years ago

Hi @Coldfire93 ,

Thank you for confirming that! I identified and fixed a bug introduced in recent update. PR #126 should resolve it.

Since the bug is in the package, I'll soon release a new version of torchdistill so that you can update the package.

Coldfire93 commented 3 years ago

Hi @Coldfire93 ,

Thank you for confirming that! I identified and fixed a bug introduced in recent update. PR #126 should resolve it.

Since the bug is in the package, I'll soon release a new version of torchdistill so that you can update the package.

OK. Thanks~

yoshitomo-matsubara commented 3 years ago

Hi @Coldfire93 , I just released the new version, torchdistill==0.2.3 Update your local package and let me know if it resolves the issue

Coldfire93 commented 3 years ago

Hi @Coldfire93 , I just released the new version, torchdistill==0.2.3 Update your local package and let me know if it resolves the issue

Hi @yoshitomo-matsubara , I updated and got an error below:

figure4_error_targets

But I have checked that the contents in the parameter "targets":

figure5_targets_contents

yoshitomo-matsubara commented 3 years ago

Could you put your 1) yaml config and 2) executed command in text instead of screenshot? I didn't get such errors when using my config files and example code in this repo

How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!

I forgot to answer this; Do you want to use weights of my customized student model? or weights of original Faster R-CNN?

Coldfire93 commented 3 years ago

Could you put your 1) yaml config and 2) executed command in text instead of screenshot? I didn't get such errors when using my config files and example code in this repo

How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!

I forgot to answer this; Do you want to use weights of my customized student model? or weights of original Faster R-CNN?

OK. 1) My yaml config is below: frcnn_resnet50_voc.txt Note: I modified the .yaml to .txt since .yaml file is not supported to upload.

2)The executed command: python examples/object_detection.py --config configs/frcnn_resnet50_voc.yaml -student_only

And, I want to use weights of your customized student model. Thank you~

yoshitomo-matsubara commented 3 years ago

The command looks fine; I'd suggest you add --log <some file path> to keep your training log e.g., python examples/object_detection.py --config configs/frcnn_resnet50_voc.yaml --log log/frcnn_resnet50_voc.txt -student_only

Your yaml config file still contains teacher model in training loop and attempts to use head network distillation, which requires teacher model. Check if the following config works for you. This one doesn't include teacher model, but trains student model by minimizing the original loss function used in torchvision. new_frcnn_resnet50_voc.txt (Note that you'll probably want to tune hyperparameters in the yaml file once you confirm it works)

To use weights of my customized student model, download checkpoints available here and specify the file path in ckpt entry of your yaml file

e.g.,

models:
  model:
    ... (skipped)
    ckpt: 'ckpt_file_path.pt '
Coldfire93 commented 3 years ago

Hi, @yoshitomo-matsubara, Thanks for your answer. But I got the same error when running my own dataset ( use the config you modified) . But, it works ok when running the coco dataset. (Error occurs before upgrading torchdistill to 0.2.3)

The log file and error msg is below: frcnn_resnet50_voc.txt

error msg: figure6_error

Coldfire93 commented 3 years ago

Could you put your 1) yaml config and 2) executed command in text instead of screenshot? I didn't get such errors when using my config files and example code in this repo

How can I get your pretrained student model on coco dataset? Because I want to load it as pretrained model to train my own dataset. Thank you!

I forgot to answer this; Do you want to use weights of my customized student model? or weights of original Faster R-CNN?

Hi, @yoshitomo-matsubara , Is teacher/coco2017-fasterrcnn_resnet50_fpn.pt the weights of the original Faster R-CNN? Where I can download it? Thank you~

yoshitomo-matsubara commented 3 years ago

Hi @Coldfire93 ,

Thanks for your answer. But I got the same error when running my own dataset ( use the config you modified) .

I found forward_proc should be forward_proc: 'forward_batch_target' for end-to-end training in case of object detection, here is the fixed one. new_frcnn_resnet50_voc.txt.

Is teacher/coco2017-fasterrcnn_resnet50_fpn.pt the weights of the original Faster R-CNN? Where I can download it? Thank you~

The teacher model weights are from torchvision, and when ckpt file for teacher (in yaml) does not exist, it downloads and uses the pretrained weights in torchvision as long as you leave pretrained: True for the teacher arguments (The list of args is completely dependent on its original interface . In our studies, I used torchvision's pretrained models as teachers.

I should have asked you this; What training method would you like to try with torchdistill? If it's end-to-end training without teacher (like torchvision's reference code), the above config should be ok. If not and you want to use something else like (generalized) head network distillation, you need teacher configs like the original official and sample configs.

P.S., It would be appreciated and more useful if you could copy and paste the error log as txt (e.g., Ctrl + Shift + C on terminal) instead of screenshot so that other users can catch this issue when searching

Coldfire93 commented 3 years ago

Hi @yoshitomo-matsubara ,

I found forward_proc should be forward_proc: 'forward_batch_target' for end-to-end training in case of object detection.

It solves the issue and starts training. It looks good. Thank you very much. I will learn more about how to config.

The teacher model weights are from torchvision, and when ckpt file for teacher (in yaml) does not exist, it downloads and uses the pretrained weights in torchvision as long as you leave pretrained: True for the teacher arguments (The list of args is completely dependent on its original interface . In our studies, I used torchvision's pretrained models as teachers.

I did the experiment, but it seems that the torchvision pretrained model was not loaded. The loss is very large shown below:

(torchdistill) songhongguang@elcnlhdc-41-239:~/lwh/torchdistill$ python examples/object_detection.py --config configs/official/coco2017/yoshitomo-matsubara/rrpr2020/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.yaml --log logs/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.log 2021/06/30 22:43:39 INFO torchdistill.common.main_util Not using distributed mode 2021/06/30 22:43:39 INFO main Namespace(adjust_lr=False, config='configs/official/coco2017/yoshitomo-matsubara/rrpr2020/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.yaml', device='cuda', dist_url='env://', iou_types=None, log='logs/ghnd-custom_fasterrcnn_resnet50_fpn_from_fasterrcnn_resnet50_fpn.log', seed=None, start_epoch=0, student_only=False, test_only=False, world_size=1) loading annotations into memory... Done (t=20.95s) creating index... index created! loading annotations into memory... Done (t=3.69s) creating index... index created! 2021/06/30 22:44:08 INFO torchdistill.common.main_util ckpt file is not found at ./resource/ckpt/coco2017/teacher/coco2017-fasterrcnn_resnet50_fpn.pt 2021/06/30 22:44:13 INFO torchdistill.common.main_util Loading model parameters 2021/06/30 22:44:14 INFO main Start training 2021/06/30 22:44:14 INFO torchdistill.datasets.sampler Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization 2021/06/30 22:44:14 INFO torchdistill.datasets.sampler Count of instances per bin: [ 104 982 24236 2332 8225 74466 5763 1158] 2021/06/30 22:44:14 INFO torchdistill.models.util [R-CNN model] 2021/06/30 22:44:14 INFO torchdistill.models.util Redesigning the R-CNN model with ['backbone.body'] 2021/06/30 22:44:14 INFO torchdistill.models.util [teacher model] 2021/06/30 22:44:14 INFO torchdistill.models.util Using the HeadRCNN teacher model 2021/06/30 22:44:14 INFO torchdistill.models.util [R-CNN model] 2021/06/30 22:44:14 INFO torchdistill.models.util Redesigning the R-CNN model with ['backbone.body'] 2021/06/30 22:44:14 INFO torchdistill.models.util [student model] 2021/06/30 22:44:14 INFO torchdistill.models.util Using the HeadRCNN student model 2021/06/30 22:44:14 INFO torchdistill.models.util Frozen module(s): {'seq.backbone.body.layer4', 'seq.backbone.body.layer3', 'seq.backbone.body.layer2'} 2021/06/30 22:44:14 INFO torchdistill.core.distillation Loss = 1.0 MSELoss() + 1.0 MSELoss() + 1.0 MSELoss() + 1.0 MSELoss() 2021/06/30 22:44:14 INFO torchdistill.core.distillation Freezing the whole teacher model 2021/06/30 22:44:14 INFO torchdistill.common.main_util Loading optimizer parameters 2021/06/30 22:44:14 INFO torchdistill.common.main_util Loading scheduler parameters 2021/06/30 22:44:21 INFO torchdistill.misc.log Epoch: [0] [ 0/29316] eta: 2 days, 5:11:02 lr: 0.0001 img/s: 1.053708416124615 loss: 1119365.1250 (1119365.1250) time: 6.5310 data: 2.7348 max mem: 9330 2021/06/30 22:57:03 INFO torchdistill.misc.log Epoch: [0] [ 1000/29316] eta: 6:02:25 lr: 0.0001 img/s: 6.1385905169748485 loss: 927617.8750 (962067.6578) time: 0.7018 data: 0.0133 max mem: 9334 2021/06/30 23:08:33 INFO torchdistill.misc.log Epoch: [0] [ 2000/29316] eta: 5:32:01 lr: 0.0001 img/s: 5.9592064060377705 loss: 919090.8750 (964320.9032) time: 0.6940 data: 0.0123 max mem: 9334 2021/06/30 23:20:04 INFO torchdistill.misc.log Epoch: [0] [ 3000/29316] eta: 5:14:16 lr: 0.0001 img/s: 6.350791444465642 loss: 986742.8125 (966603.5369) time: 0.6556 data: 0.0125 max mem: 9334

I should have asked you this; What training method would you like to try with torchdistill? If it's end-to-end training without teacher (like torchvision's reference code), the above config should be ok. If not and you want to use something else like (generalized) head network distillation, you need teacher configs like the original official and sample configs.

I want to use torchdistill to do knowledge distillation. My method contains three steps: 1) train teacher model on voc datasets( as you said, use torchvision's pretrained model as teacher ); 2) train student model on voc datasets( use your custom model(trained on coco) as my pretrained model); 3) train student model use ghnd method( use the step 2 model as pretrained model)

I want to compare the performance of the two model trained by step 2 and step 3. The performance of the model trained by step 3 is supposed to be better than that of step 2.

I wonder if the above steps are reasonable.

Thanks for your patience.

P.S., It would be appreciated and more useful if you could copy and paste the error log as txt (e.g., Ctrl + Shift + C on terminal) instead of screenshot so that other users can catch this issue when searching

OK. I understand. šŸ˜Š

yoshitomo-matsubara commented 3 years ago

Hi @Coldfire93 ,

It solves the issue and starts training. It looks good. Thank you very much. I will learn more about how to config.

Great to hear that :)

I did the experiment, but it seems that the torchvision pretrained model was not loaded. The loss is very large shown below:

I believe that you're using pretrained teacher model and the loss values you showed above are not that large for GHND (generalized head network distillation) since the loss is the sum of squared errors as shown in Fig. 1 and Eq. (2) of the paper. Note that in the above log says Loss = 1.0 * MSELoss() + 1.0 * MSELoss() + 1.0 * MSELoss() + 1.0 * MSELoss() but they are MSELoss module in torch and their reduction is sum and they are working as sum of squared error losses. The log file associated with the yaml file is also available in the folder, and you can refer to the numbers at each epoch (though you cannot expect them to match loss values in your training log).

I want to compare the performance of the two model trained by step 2 and step 3. The performance of the model trained by step 3 is supposed to be better than that of step 2. I wonder if the above steps are reasonable.

The step 3 looks built on step 2 like pretrained on coco -> end-to-end training on voc (step 2) -> GHND on voc (step 3). If you have some hypothesis that the three steps significantly improve performance over simple end-to-end training, it may be worth trying.

If not and you simply want to see end-to-end training vs. GHND, I'd suggest the following three separate experiments:

  1. train teacher model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
  2. train my student model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
  3. train my student model by GHND on voc dataset with/without initializing the model by my published trained weights, using 1. as teacher model

so that you can compare the performance of step 2 with that of step 3. Note that the student model at the 3rd experiment is partially initialized with the teacher model obtained through the 1st experiment, not with the student model through the 2nd experiment.

To leverage of GHND, you should initialize weights of layers in student at step 2 by those in teacher model fine-tuned to VOC (step 1) as HND and GHND reuse pretrained teacher model's tail portion for that of student model (i.e., the first k layers in student are trained by HND or GHND and all their remaining layers are fixed and identical to those in teacher in terms of architecture and learned params)

Coldfire93 commented 3 years ago

Hi @yoshitomo-matsubara ,

  1. train teacher model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
  2. train my student model in end-to-end manner (like torchvision's reference code) on voc dataset with/without initializing the model by torchvision's pretrained weights
  3. train my student model by GHND on voc dataset with/without initializing the model by my published trained weights, using 1. as teacher model

I'm confused about step 2. I thought the student network is designed by you( You modified the structure of the backbone). And there is no corresponding pretrained model in torchvision.

Maybe I should learn more about GHND? Iā€˜d like your advice. Thank you~

yoshitomo-matsubara commented 3 years ago

@Coldfire93 Yes, I designed the student model and there is no pretrained model in torchvision about the step 2. It meant with/without initializing the model by my pretrained weights.

Coldfire93 commented 3 years ago

Hi, @yoshitomo-matsubara

The size of your teacher model and student model is almost the same(about 160M) . I wonder why?

It's expected that the student model is smaller than the teacher model.

yoshitomo-matsubara commented 3 years ago

Hi @Coldfire93 ,

The size of your teacher model and student model is almost the same(about 160M) . I wonder why? It's expected that the student model is smaller than the teacher model.

The student models in the example are from our ICPR paper (preprint ver.). The teacher models are pretrained Faster, Mask, and Keypoint R-CNNs in torchvision, and their student models are based on the teacher models but modified to introduce bottlenecks for split computing i.e., the first layers until bottleneck called head model will be executed on mobile device and its output (compressed information called bottleneck) will be transferred to edge server to complete the inference by the rest of the model (called tail model).

While the overall student model size is almost the same as teacher model, the student model w/ bottleneck can achieve shorter end-to-end latency by splitting the inference for resource-constrained edge computing systems. Read the above paper for more details.

Could you please tell me the teacher's information? (mAP, #Epochs, Training time)

As described in the torchdistill paper, I did all the experiments for reproducing experimental results reported in prior studies. The Table 6 shows results originally reported in the above ICPR paper reproduced by torchdistill. Thus, the teacher models are also pretrained Faster and Mask R-CNN models in torchvision. The mAP of the teacher models are also shown in Table 3 in the above ICPR paper, and other information (# epochs and training time) can be found in torchvision's example code and blog post.

Coldfire93 commented 3 years ago

Hi @yoshitomo-matsubara , Thank you for your reply. I understand.

Actually, I want to get a smaller student model by doing knowledge distillation. Obviously, the ghnd method can not do that.

But, the training time is shorten by using the ghnd method. (60hours v.s. 24hours) That's good.

Thank you again.

yoshitomo-matsubara commented 3 years ago

For object detection, applying knowledge distillation to object detection in end-to-end manner is pretty difficult as I answered at https://github.com/yoshitomo-matsubara/torchdistill/issues/117

FYI, torchvision recently introduced SSD object detection models. If you find a pair of module paths in student and teacher models that match the output shapes to compute a loss value, you can do a kind of such knowledge distillation (student model much smaller than teacher model) for object detection by defining so in a yaml file

Coldfire93 commented 3 years ago

Hi @yoshitomo-matsubara , I'm trying to understand what you said above. I read the paper and learned about the GHND method. This method is very valuable. I will continue to follow your work. I will read your code to learn more.

Thank you again.

yoshitomo-matsubara commented 3 years ago

Hi @Coldfire93 , My pleasure. Feel free to ask me if you have any question.

yoshitomo-matsubara commented 2 years ago

@Coldfire93 Closing this issue as I haven't seen any follow-up for a while. Open a new Discussion (not Issue) if you still have questions