hkchengrex / XMem

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
https://hkchengrex.com/XMem/
MIT License
1.72k stars 191 forks source link

Blank segmentation masks when running inference on live video feed #45

Closed aalessi3 closed 1 year ago

aalessi3 commented 1 year ago

I am trying to run the model by feeding in a series of live video frames. I am providing the first frame segmentation mask and my camera is situated in a fixed orientation such that the first video frame and the provided mask will align. I am attempting to do so by modifying the eval.py script and feeding my video frame and mask via cv2 as opposed to the dataloader. In doing so I seem to only receive blank segmentation masks as an output, similar to https://github.com/hkchengrex/XMem/issues/41#issue-1421286267. I have also tried feeding a pre-saved video sequence and mask by simply reading and loading images via cv2, I once again receive only blank segmentation masks. I feel perhaps I am overlooking a key function of the dataloader. Any insight as to the errors in my current approach or possible direction would be greatly appreciated!

hkchengrex commented 1 year ago

Do the interactive GUI/our pre-defined datasets work? The linked issue points to an installation problem and not an implementation issue.

aalessi3 commented 1 year ago

Yes the GUI and predefined DS work. I have also been able to run on custom DS with no issues. My issue is only similar to the linked issue in that my outputs are identical, i.e. the first mask provided is saved and present but all others are blank.

hkchengrex commented 1 year ago

You might want to debug the code and print the line where the output becomes abnormal (compared with predefined datasets)

bowen-upenn commented 1 year ago

https://github.com/hkchengrex/XMem/issues/41#issuecomment-1676420226