isl-org / lang-seg

Language-Driven Semantic Segmentation
MIT License
706 stars 87 forks source link

question about the image encoder #20

Closed chufengt closed 2 years ago

chufengt commented 2 years ago

Hi, thanks for open-sourcing the code.

I have a quick question:

What's the reason for choosing DPT as the image encoder?

What should I note if I want to use other encoders (e.g., HR-Net)?

Boyiliee commented 2 years ago

Hi, @chufengt,

When we design LSeg, DPT is one of the state-of-the-art models for semantic segmentation. Therefore, we pick it for our architecture.

Yea, you could use other encoders, but you may change the hyperparameters based on their settings.

Hope this helps!

ngfuong commented 2 years ago

Hi I have a follow-up question about DPT as image encoder,

image According to the Training Details section in your paper, did you use DPT as both the encoder and decoder? Where in the code can I find your implementation of DPT?