hkchengrex / XMem

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
https://hkchengrex.com/XMem/
MIT License
1.76k stars 192 forks source link

Mask of the first frame without gui #132

Closed mhmd-mst closed 11 months ago

mhmd-mst commented 11 months ago

Hello, thanks for your contribution, regarding provinding a mask without an interface and when the video has different camera angles, like consider a clip from a show where scene might change with different characters(considering person is the class of interest and multiple people appearing at the same time which I should track simaltaneously), how can I do that to provide the first frame mask of each scene and camera angle change?

hkchengrex commented 11 months ago

eval.py supports using multiple input masks containing different objects (ref. YouTubeVOS evaluation does this). If the different camera angles contain substantially different content, you might be better off creating another evaluation session whenever the scene changes (e.g., treat them as separate videos). Otherwise the memory from a previous scene might creep in.

As a side note, our new project Cutie might work better.