对model overview的困惑需要向您请教下 - Githubissues

xcyao00 / FOD

Pytorch Implementation for ICCV2023 paper: Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection

MIT License

37 stars 6 forks source link

对model overview的困惑需要向您请教下 #8

Open wangxin-fighting opened 1 month ago

wangxin-fighting commented 1 month ago

model 您好，我有两个疑惑。 1：请问论文中图2的model overview里面，X和Xk-1两个有什么区别？ 2：不同的transformer层是怎么连接的？比如我看您的代码里面只有encoder，没有decoder，请问第一层Transformer（encoder）的输出是什么，它是怎么传递到第二层作为输入的呢？

xcyao00 commented 1 month ago

X是输入整个Transformer重构网络的输入特征，X_k-1是用于区分Transformer网络里面每一层的特征，因此X_0就是X。
不同Transformer层就直接堆叠在一起，代码中就是EncoderLayer堆叠在一起；Transformer的Encoder和Decoder本身其实是差不多的，在语言模型中Decoder中会有corss-attention，在视觉模型中，一般都只有Encoder没有Decoder，如ViT中；我们的模型是很多EncoderLayer顺序组合在一起，这里这个Transformer模型是用于重构输入特征，所以叫Decoder可能更合适。

wangxin-fighting commented 1 month ago

谢谢您的及时回复，祝您工作和学习顺遂。