Question about training new dataset.

First, thank you for your valuable research. I would like to ask for your opinion on how to approach dataset to train like DAVIS-dataset(https://davischallenge.org/).

The expected problems are as follows.

DAVIS has multiple categories. DAVIS Dataset has a variety of data, including dancers, boxers, elephants, cars, horses, and about 80 to 90 categories include one video. The algorithm seems to have strengths in a single category, is there a good way to deal with various categories? Should we just collect more data and integrate similar categories as much as possible to learn only one category?
Background is very dynamic. Due to the nature of dataset, the background is so chaotic that it is difficult to separate objects and background properly (from a learning perspective). To solve this problem, is it right to mask the video only for the data that contains segmentation mask(annotation or label) and forcibly remove the background to create a new dataset that contains only the motion of the object?

I really like the author's thesis and execution code, so I want to use it for my project. Thank you.

snap-research / articulated-animation

Question about training new dataset. #36