hustvl / WeakTr

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
MIT License
122 stars 2 forks source link

Failure to achieve the given mIoU result in the first stage #21

Closed 1rua11 closed 11 months ago

1rua11 commented 12 months ago

Thank you very much for your work! But there are some queries that I would like to get your help on, in the CAM generation step why is the first instruction(python main.py --model deit_small_WeakTr_patch16_224 \ --data-path data \ --data-set VOC12 \ --img-ms-list voc12/train_id.txt \ --cam-npy-dir WeakTr_results/WeakTr/attn-patchrefine-npy \ --output_dir WeakTr_results/WeakTr \ --reduction 8 \ --pool-type max \ --lr 6e-4 \ --weight-decay 0.03 \) using “train_id.txt” instead of “train_aug_id.txt” , also the result 69.4% of the VOC that you have given is using “deit_small_WeakTr_patch16224” or “deit small_WeakTr_AAF_RandWeight_patch16_224” ? I followed the instructions you provided and used “deit_small_WeakTr_patch16_224” and the VOC result for the first step (End-to-End CAM Generation) is 67.7%, not 69.4% .

Unrealluver commented 12 months ago

Thanks for your interest in our work! But I am sorry that I could not fully understand your questions. Could you please list your questions one by one? Or directly describe them in Chinese?

1rua11 commented 12 months ago

作者您好!

      好的,抱歉我没有表述清楚。我将用中文表述一下我的疑问。      (1)在训练阶段的第一步端到端CAM生成中所给的VOC的训练结果miou为69.4%,我按照所给的程序和操作没有复现出这个结果,我不清楚问题出在哪里,所以我猜想是否是使用的模型不对,我使用的是     deit_small_WeakTr_patch16_224模型,结果为67.7%。      (2)同样在训练阶段的第一步CAM生成步骤里,VOC训练时使用的是train_id.txt,包含1464张图片,为什么不是使用10582张图片呢?具体指令如下:

祝好!

------------------ 原始邮件 ------------------ 发件人: "hustvl/WeakTr" @.>; 发送时间: 2023年10月11日(星期三) 中午1:22 @.>; @.**@.>; 主题: Re: [hustvl/WeakTr] Failure to achieve the given mIoU result in the first stage (Issue #21)

Thanks for your interest in our work! But I am sorry that I could not fully understand your questions. Could you please list your questions one by one? Or directly describe them in Chinese?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Unrealluver commented 12 months ago

Thank you for reaching out and trying our methods!

For your first question, the performance discrepancy might be attributed to variations in hardware specifications. We conducted our training on a Nvidia A400 (16G) GPU. Should your GPU be of a different model or specification, there could be variances in performance outcomes. To mitigate this, you have a couple of options:

Regarding your second question, we only use train_id.txt to evaluate the model's performance during the training process, and the training data is train_aug_id.txt.

If you have any other questions, feel free to communicate with us.


感谢您联系并尝试我们的方法!

对于您的第一个问题,性能差异可能归因于硬件规格的变化。我们在 Nvidia A400 (16G) GPU 上进行了训练。如果您的 GPU 具有不同的型号或规格,则性能结果可能会有所不同。为了缓解这种情况,您有几个选择:

关于您的第二个问题,我们仅使用 train_id.txt 来评估模型在训练过程中的性能,训练数据为 train_aug_id.txt

如果您还有其他问题,请随时与我们沟通。

Unrealluver commented 11 months ago

I will close this issue. If there are more questions, you are welcome to raise issues :)

zbb1111 commented 1 week ago

非常感谢您出色的工作,我有几个问题想请教一下,希望得到您的回复!!!!谢谢!!! 1.对于training,我看您的回答训练的是train_aug_id,但是这里是train_id,是否应该改成rain_aug_id? 2. 7b819143caade73ba63e03af8aef6ac Generate CAM,这一步是生成的coarse CAM,还是fine CAM 3. cce4d266ecff84b9c7e3cbd405fdf69 这一步对应的结果是66.2%? 4. 678d203d790d92262988a6d2dd4d81a CRF post-processing,这一步是什么意思?是得到MASK 76.5%这一步吗?