图像patch - Githubissues

huawei-noah / Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

4.07k stars 708 forks source link

Closed onlinehuazai closed 2 months ago

onlinehuazai commented 2 years ago

ViG必须固定图像尺寸吗？每张图片resize到相同尺寸？能不能每张图片尺寸不同，patch不同，构件图的时候结点数量也不同

iamhankai commented 2 years ago

可以不同尺寸，在目标检测中就是这样

onlinehuazai commented 2 years ago

可以不同尺寸，在目标检测中就是这样

每张图片划分的的结点都不同吗，请问有相关资料吗

iamhankai commented 2 years ago

onlinehuazai commented 2 years ago

#133 (comment)

训练的时候能不固定尺寸吗吗，每个图的结点数量不固定

iamhankai commented 2 years ago

每个batch内部不同图片尺寸要一致，通过resize或padding让他们一致；不同batch可以不一致。

0xf21 commented 2 years ago

你好，我有3个问题：

你们是否试过ViT划分patch的方法？有的话对比效果如何？
论文的Figure 4: Visualization of the constructed graph structure. (b)中显示第1个和第12个block都是14x14个节点，但根据ViG-Ti结构，第1个和第12个block的节点数应是56x56、7x7，对应不上14x14，如何理解？
论文的_Figure 3: Feature diversity of nodes as layer changes._这个图是怎么计算的？需要用到什么数据集来计算吗？

望解答，谢谢！

iamhankai commented 2 years ago

你有一点误解了，ViG直接划分14x14个节点，没有金字塔结构，和ViT一样。Pyramid ViG是金字塔结构的，节点数从56x56一直下降到7x7。

关于Feature diversity是根据论文的方法（Attention is not all you need: Pure attention loses rank doubly exponentially with depth）算出来的，用ImageNet数据集。

zxy1728 commented 3 months ago

#133 （评论）

训练的时候能不固定尺寸吗吗，每个图的结点数量不固定

请问这个问题您有答案了吗？是否可以每个图的结点数量不固定、尺寸也不固定呢？