opengeos / segment-geospatial

A Python package for segmenting geospatial data with the Segment Anything Model (SAM)
https://samgeo.gishub.org
MIT License
3.01k stars 311 forks source link

Poor performance with non-"tree" text prompts #306

Open dmraji opened 2 months ago

dmraji commented 2 months ago

Environment Information

Description

Using the text-prompt segmentation, the performance seems to be poor with many prompts (except "tree", which is used in the demo). See images.

prompt_car prompt_house prompt_road

What I Did

I have tried a large range of box/text threshold values for these prompts, with generally unsatisfactory results regardless for the segmentation. The box drawing does tend to improve however, which is confusing.

brendancol commented 2 months ago

@dmraji I think "generally unsatisfactory" is fair because you are finding the combination of data quality and hyperparameters and most combinations will be bad....but the hope is that you find one that works.

Out of all your threshold values trails, what was the best combination and do you have a reproduction code snippet to help the conversation progress? If you drop the thresholds too much, you will need to compensate with other heuristics to filter out false positives etc.

dmraji commented 2 months ago

Out of all your threshold values trails, what was the best combination and do you have a reproduction code snippet to help the conversation progress? If you drop the thresholds too much, you will need to compensate with other heuristics to filter out false positives etc.

@brendancol Here are the values I have found to be somewhat reasonable for bounding box generation at least.

## tree
text_prompt = 'tree'
box_thresh_value = 0.24
text_thresh_value = 0.24

## house
text_prompt = 'house'
box_thresh_value = 0.16
text_thresh_value = 0.72

## road
text_prompt = 'road'
box_thresh_value = 0.10
text_thresh_value = 0.10

## car
text_prompt = 'car'
box_thresh_value = 0.08
text_thresh_value = 0.24

I notice that in each other case (besides "tree") in order to generate any amount of reasonable bounding boxes, the box threshold has to be lowered quite significantly. However, there always seems to be a box created which encompasses the entire image even at the highest thresholds where any boxes are generated. I think this facet is messing with the segmentation results for the other prompts; no amount of changing the text threshold value seems to alter the results while the entire-image bounding box is present.

SFrav commented 2 weeks ago

How did you get on with this?

Could you loop through the combinations on an area where you know the object count and then return the combinations that return the closest count? I'm curious to find out how well parameters transfer to other landscapes.