Closed FabianSchuetze closed 4 months ago
The output will be saved to the disk: https://github.com/mit-han-lab/efficientvit/blob/a852f059dbd0adcb549b41159bec639a069cb90f/demo_sam_model.py#L200
Thanks for your reply.
I think the problem is that no masks are generated. I revised the gist here. Consider the following snippted:
model = 'l2'
weight_url = './l2.pt'
image_path = './assets/fig/cat.jpg'
import time
from efficientvit.apps.utils import parse_unknown_args
from efficientvit.models.efficientvit.sam import EfficientViTSamAutomaticMaskGenerator, EfficientViTSamPredictor
from efficientvit.models.utils import build_kwargs_from_config
from efficientvit.sam_model_zoo import create_sam_model
import numpy as np
from PIL import Image
efficientvit_sam = create_sam_model(model, True, weight_url).cuda().eval()
efficientvit_sam_predictor = EfficientViTSamPredictor(efficientvit_sam)
efficientvit_mask_generator = EfficientViTSamAutomaticMaskGenerator(
efficientvit_sam)
# load image
raw_image = np.array(Image.open(image_path).convert("RGB"))
H, W, _ = raw_image.shape
print(f"Image Size: W={W}, H={H}")
tmp_file = f".tmp_{time.time()}.png"
masks = efficientvit_mask_generator.generate(raw_image)
assert len(masks) > 0, "must have at least one output"
The assert at the end of the snippet fails because the length of mask
is zero. If I replace the model with the xl1
model, there are ten masks.
This is because the hyperparameters of EfficientViTSamAutomaticMaskGenerator
are not set correctly:
https://github.com/mit-han-lab/efficientvit/blob/70aec14f73ade0c0541ae6780648fa89125c12fd/demo_sam_model.py#L135-L137
I have changed the default values to address the issue. You may need to tune these hyperparameters for different models to achieve the best visual results for the automatic mask generation mode.
Thanks, sounds good.
Thanks for the wonderful repo - it's a pleasure to work with it and to read the paper.
I notice that I get zero output on the demo image (cat on table) with the l2 model. See the google colab gist here (needs to be run with an active GPU):
https://colab.research.google.com/drive/1-5VgyRFjLZTt41hldWml1HbSAG_YhxQv?usp=sharing
Is this the intendend output and can I change some hyperparameters to get predictions?