mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.6k stars 142 forks source link

Zero Output with L2 Model On Demo Image #60

Closed FabianSchuetze closed 4 months ago

FabianSchuetze commented 4 months ago

Thanks for the wonderful repo - it's a pleasure to work with it and to read the paper.

I notice that I get zero output on the demo image (cat on table) with the l2 model. See the google colab gist here (needs to be run with an active GPU):

https://colab.research.google.com/drive/1-5VgyRFjLZTt41hldWml1HbSAG_YhxQv?usp=sharing

Is this the intendend output and can I change some hyperparameters to get predictions?

han-cai commented 4 months ago

The output will be saved to the disk: https://github.com/mit-han-lab/efficientvit/blob/a852f059dbd0adcb549b41159bec639a069cb90f/demo_sam_model.py#L200

FabianSchuetze commented 4 months ago

Thanks for your reply.

I think the problem is that no masks are generated. I revised the gist here. Consider the following snippted:

model = 'l2'
weight_url = './l2.pt'
image_path = './assets/fig/cat.jpg'

import time
from efficientvit.apps.utils import parse_unknown_args
from efficientvit.models.efficientvit.sam import EfficientViTSamAutomaticMaskGenerator, EfficientViTSamPredictor
from efficientvit.models.utils import build_kwargs_from_config
from efficientvit.sam_model_zoo import create_sam_model
import numpy as np
from PIL import Image

efficientvit_sam = create_sam_model(model, True, weight_url).cuda().eval()
efficientvit_sam_predictor = EfficientViTSamPredictor(efficientvit_sam)
efficientvit_mask_generator = EfficientViTSamAutomaticMaskGenerator(
    efficientvit_sam)

 # load image
raw_image = np.array(Image.open(image_path).convert("RGB"))
H, W, _ = raw_image.shape
print(f"Image Size: W={W}, H={H}")

tmp_file = f".tmp_{time.time()}.png"
masks = efficientvit_mask_generator.generate(raw_image)
assert len(masks) > 0, "must have at least one output"

The assert at the end of the snippet fails because the length of mask is zero. If I replace the model with the xl1 model, there are ten masks.

han-cai commented 4 months ago

This is because the hyperparameters of EfficientViTSamAutomaticMaskGenerator are not set correctly: https://github.com/mit-han-lab/efficientvit/blob/70aec14f73ade0c0541ae6780648fa89125c12fd/demo_sam_model.py#L135-L137

I have changed the default values to address the issue. You may need to tune these hyperparameters for different models to achieve the best visual results for the automatic mask generation mode.

FabianSchuetze commented 4 months ago

Thanks, sounds good.