zihangJiang / TokenLabeling

Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
Apache License 2.0
426 stars 36 forks source link

怎么在自己的数据集上生成dense label map呢? #7

Closed wxshan closed 3 years ago

wxshan commented 3 years ago

怎么在自己的数据集上生成dense label map呢?

zihangJiang commented 3 years ago

Hi, Assume you've got a pre-trained classifier like EfficientNet-B6 for your dataset, you can simply remove the final global average pooling layer (and may have to adjust the shape of the output feature map before you feed it to the classification head) to generate the dense label map.

wxshan commented 3 years ago

If I save the feature vector of the picture as pt, how can I make the label of the small image correspond to the label of the whole image later?

zihangJiang commented 3 years ago

This is automatically done during training. https://github.com/zihangJiang/TokenLabeling/blob/aa438eff9b9fc2daa8c8b4cc6bfaa6e3721f995e/tlt/data/mixup.py#L27-L33

The label map will be cropped (according to the random crop box) and resized to the target shape.

wxshan commented 3 years ago

Thank you

At 2021-06-18 10:57:01, "蒋子航" @.***> wrote:

This is automatically done during training. https://github.com/zihangJiang/TokenLabeling/blob/aa438eff9b9fc2daa8c8b4cc6bfaa6e3721f995e/tlt/data/mixup.py#L27-L33

The label map will be cropped (according to the random crop box) and resized to the target shape.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

zihangJiang commented 3 years ago

You can reopen this issue if you have any further questions.

theneotopia commented 2 years ago

Hi, Assume you've got a pre-trained classifier like EfficientNet-B6 for your dataset, you can simply remove the final global average pooling layer (and may have to adjust the shape of the output feature map before you feed it to the classification head) to generate the dense label map.

Hi Zihang @zihangJiang , thanks for your excellent work. I'd like to ask how the score map are generated. First we use a pretrained machine annotator like NFNet-F6, and as you say we remove the final global average pooling layer, we get the feature map of an input image. But how to transform the feature map to the dense score map, let's say 1000*patch_num for ImageNet? Sorry for my unfamiliarity with the NFNet-F6, will it automatically generate the patch label or there are some other technique to finish this? Thanks for your reply.