Closed eafpres closed 9 months ago
Hi @eafpres, thank you for choosing our work. Glad to be your help!
First of all, I tried myself with the provided image and here are the results. Threshold | 0.5 (default) | 0.01 | 0.99 |
---|---|---|---|
Saliency Map Only |
There are different from your results. Can I have your script since I used a command line tool not python script. It seems like you used our python API, so it would be helpful for us to find a problem if you provide your script.
Moreover, we also recommend trying base-nightly
mode which can be enabled by using --mode base-nightly
argument.
Feel free to ask more question. Thanks.
There are different from your results
The original image is 3000 x 4000 pixels .jpg I wonder if the large image has some relation to the way the threshold is working? I did do further testing and I can see impact if I make the threshold 1e-4 or 1e-3 in these cases.
It seems like you used our python API, so it would be helpful for us to find a problem if you provide your script.
remover = Remover(mode = 'base-nightly', device = 'cuda:0')
img = Image.open(data_dir + '/' + file).convert('RGB')
img = ImageOps.exif_transpose(img)
img_no_bg = remover.process(img, threshold = prediction_threshold)
Note in the code, the ImageOps.exif_transpose() just rotates the image per the meta data in the jpg file
Moreover, we also recommend trying base-nightly mode which can be enabled by using --mode base-nightly argument
Yes, I have been using that from the start as I realize this is in active development!
I have confirmed that reducing the image size does affect the proper values for the threshold parameter. Reducing to 400 x 400, the value of 0.5 is more in the middle of the effects,.
That is true. Actually, we trained our model with the fixed size of 1024 x 1024, so it would produce a more stable result if the given image is hard enough to generate a saliency mask such as your example. Salient object detection dataset are mostly center biased, which means, objects are mostly located on the center of the image, and mostly not occluded nor truncated by the image frame. Otherwise, the network would struggle finding the proper region of saliency. Moreover, if the given image is way more larger or smaller than usual, then it would struggle more.
Long story short, I think in your case, you might need to resize the input image into a fixed size like you said 400 x 400.
Also, if you do not need a high quality result in terms of an accurate prediction around the edges of the object, then you might want you use a mode= 'fast'
option which automatically resizes the input image into 384 x 384 and it is trained with the same size (384 x384). It also consumes less gpu memory and less computational cost.
Feel free to ask more question. Thanks.
I've found some interesting cases that seem to largely defeat the algorithm. Are you interested to receive those images for development?
I am interested to see those images, but currently I'm already graduated, so I cannot access to any GPU machine to train for my own project. Thank you for your offer btw.
Here are some samples. I'd be grateful for any thoughts you have. In these cases, adjusting threshold makes a change but not enough to get a good result.
System: WSL 2 in Windows 11, running Ubuntu 20.04 Python 3.9 3080 Ti 12 GB GPU 32 GB RAM
Excellent package. This is the best OSS background removal I have found, thanks.
I have tested using threshold 0.01 and 0.99 to see if I can optimize for my use case. Two pairs of images as example--left is original, right after remover.process() A) Threshold 0..01
B) Threshold 0.99
Here we see that we are having trouble with light background pavement, and over the full range of threshold cannot meaningfully impact that performance.
Is there anything else I could try with the code as is, or should I consider some form of fine-tuning? Do you support that?