mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.79k stars 164 forks source link

[Semantic segmentation output size problem] #111

Open CaptainEven opened 4 months ago

CaptainEven commented 4 months ago

Hi, thanks for the excellent work! I found b0 "cityscape" semantic image size is only 1/8 of the input image size? May i ask where the problem is?

CaptainEven commented 4 months ago

I found the 'segout' head is not equipped with a Sigmoid head, and the training is not going to converge on my observation...