hkchengrex / XMem

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
https://hkchengrex.com/XMem/
MIT License
1.72k stars 191 forks source link

Make a custom dataset #115

Closed ZisongXu closed 1 year ago

ZisongXu commented 1 year ago

Hi dear:

Thanks for the really nice project. I am following the inference instructions and running the eval.py file. When I look at the output folder, it seems like all the segmentation masks are not correct:

Screenshot from 2023-08-14 11-43-24

All the images look black, but it is actually possible to see that the target object in each image is nearly black (a bit grey) and the background is black.

Images I input are: Screenshot from 2023-08-14 11-47-55 Screenshot from 2023-08-14 11-48-16

The result looks wrong, but then again it's not quite wrong. The output I'd like is that the target object for each image is white and the background is black. The output now is that the target object in each image is near black.

I tried the answer that under https://github.com/hkchengrex/XMem/issues/41, but it generated a new error.

Do you have any ideas about that?

Thanks a lot!!!

ZisongXu commented 1 year ago

Sorry, I have now changed the image format in the "JPEGImages" folder to ".jpg" and re-run the code, but the problem remains the same.

hkchengrex commented 1 year ago

If you load the (input and output) masks with PIL, what does np.unique(np.array(mask)) give you?

ZisongXu commented 1 year ago

Sorry, I am new to this, so I may have some questions. When you said "np.unique(np.array(mask))", do you mean the code under the "XMem/inference/data/mask_mapper.py" file?

If it is,


def convert_mask(self, mask, exhaustive=False):
        # mask is in index representation, H*W numpy array
        labels = np.unique(mask).astype(np.uint8)
        print("==========================================")
        print("labels:")
        print(labels)
        print("==========================================")
        labels = labels[labels!=0].tolist()

the output is:

==========================================                                                         
labels:
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]
==========================================
hkchengrex commented 1 year ago

This means your input is not binary. The easiest way is to pre-process your input mask to make them binary. See also: https://github.com/hkchengrex/XMem/blob/main/docs/PALETTE.md

ZisongXu commented 1 year ago

Thanks a lot!!! I will do it now. I will give you feedback if it produces the right results, thank you!

ZisongXu commented 1 year ago

Thanks a lot!!! IT WORKS!!!