rosinality / ocr-pytorch

Object-Contextual Representations for Semantic Segmentation in PyTorch
MIT License
63 stars 14 forks source link

Pipeline #3

Open ZshahRA opened 4 years ago

ZshahRA commented 4 years ago

**Hello Everyone,

Could you please explain the following section of the code in relation to the pipeline. My question is why did you use the conv2d followed by conv1d. What are the benefits and why you consider this.**

def conv2d(in_channel, out_channel, kernel_size): layers = [ nn.Conv2d( in_channel, out_channel, kernel_size, padding=kernel_size // 2, bias=False ), nn.BatchNorm2d(out_channel), nn.ReLU(), ]

return nn.Sequential(*layers)

def conv1d(in_channel, out_channel): layers = [ nn.Conv1d(in_channel, out_channel, 1, bias=False), nn.BatchNorm1d(out_channel), nn.ReLU(), ]

return nn.Sequential(*layers)

class OCR(nn.Module): def init(self, n_class, backbone, feat_channels=[768, 1024]): super().init() I didn't see in the model how you refer resnet 101 or HRNet. self.backbone = backbone What does this feat channels and ch16, ch32 means. ch16, ch32 = feat_channels what does this mean? self.L = nn.Conv2d(ch16, n_class, 1) self.X = conv2d(ch32, 512, 3) I found in the article these phi, psi are transformation functions and were used in self-attention. Could you please explain the benefit of using these function?

    self.phi = conv1d(512, 256)
    self.psi = conv1d(512, 256)
    self.delta = conv1d(512, 256)
    self.rho = conv1d(256, 512)
    self.g = conv2d(512 + 512, 512, 1)

    self.out = nn.Conv2d(512, n_class, 1)

    self.criterion = nn.CrossEntropyLoss(ignore_index=0)

def forward(self, input, target=None):
    input_size = input.shape[2:]
    stg16, stg32 = self.backbone(input)[-2:]

    X = self.X(stg32)

Thanks in advance

rosinality commented 4 years ago

I didn't see in the model how you refer resnet 101 or HRNet.

self.backbone = backbone

I made it that you can pass specific backbones into the model. See: https://github.com/rosinality/ocr-pytorch/blob/master/train.py#L179

What does this feat channels and ch16, ch32 means.

ch16, ch32 = feat_channels

It specifies number of channels of stride 16, stride 32 feature maps.

what does this mean?

self.L = nn.Conv2d(ch16, n_class, 1)
self.X = conv2d(ch32, 512, 3)

X corresponds to pixel representations, L corresponds to soft object regions. You can find this in the paper version 1.

phi & psi is for computing logits similar to commonly used for self attentions. One difference is activation is used for that. Maybe authors found that adding more nonlinearities is better for this.

ZshahRA commented 4 years ago

Hello Rosinality,

Thank you for your prompt response. Could you please answer these 2 questions. 1) why did you use the conv2d followed by conv1d. What are the benefits of using conv2d and conv1d and why you consider both.

2) I have kind of futuristic question also. If I want to change the model of OCR and include attention in the model. Is that possible? Also what modification would you suggest to implement OCR.

Thank you

rosinality commented 4 years ago
  1. I have used Conv1d as feature maps will be flattened to compute self attentions. Actually you can implement it solely using Conv2d.
  2. Do you want to use OCR attentions to your models? One core idea of OCR is create soft attention region maps (corresponds to each semantic class) and using it for attention. I think you need to check that it is appropriate to your tasks and models.
ZshahRA commented 4 years ago

In my case, there will be a grayscale image showing temperature values of an object. Where I have to use semantic segmentation to delineate between defective and non-defective area in one image. Could you please suggest some ideas that you have. Thank you once again for your assistance.

rosinality commented 4 years ago

You can try to use defective and non-defective as 2 classes and use OCR on it. I don't know OCR is very appropriate for your tasks, but it may be worth try if you have some baseline segmentation models.

ZshahRA commented 4 years ago

Yeah sure. Thanks I will try. Also, based on your broad knowledge about several models. What other models would you suggest that would be appropriate to try. Or any modification in OCR models that would be poissibly helpful. Thanks for your time.

rosinality commented 4 years ago

I think UNet or FPN and DeepLab v3+ will be simple but powerful baselines. If I use OCR then I will try to use more powerful backbones or use decoder like approaches (that is, for example, concatenate stride 4 features) such as DeepLab v3+

ZshahRA commented 4 years ago

Hello Rosinality,

In table 4 in the article, you mentioned comparison with other methods. Question 1) Did you actually implement all those methods yourselves. Or you just compare with them.

I am interested in CC-Attention, Self-Attention and Double Attention.

Question 2) If you implement these three methods your selves. Could you please share your code.

Thank you

Kind regards Rahmat

rosinality commented 4 years ago

Sorry, I'm not the author of the paper.

ZshahRA commented 4 years ago

Thanks

On Tue, Feb 18, 2020 at 6:13 PM Kim Seonghyeon notifications@github.com wrote:

Sorry, I'm not the author of the paper.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rosinality/ocr-pytorch/issues/3?email_source=notifications&email_token=AMWOBYWQZUHQ42OZUSFGE6LRDR2Q3A5CNFSM4KV7HRS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMF3CMQ#issuecomment-587968818, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWOBYX3HPIPXL4AGAEJZ6TRDR2Q3ANCNFSM4KV7HRSQ .

ZshahRA commented 4 years ago

Thanks Rosinality

ZshahRA commented 4 years ago

Hi rosianlity, Could you please share your email.

ZshahRA commented 4 years ago

Hi Rosinality,

Could you please explain how to apply Global Weight average pooling in pytorch. For your refernce the article which discussed is mentioned as below. Thank you

@inproceedings{kolesnikov2016seed, title={Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation}, author={Kolesnikov, Alexander and Lampert, Christoph H.}, booktitle={European Conference on Computer Vision ({ECCV})}, year={2016}, organization={Springer}

rosinality commented 4 years ago
pooled = (weight * input).sum([2, 3], keepdim=True)