Our released dataset contains 1203 categories, but we are only using the first 1196 categories to train the model. A suitable approach is to filter out the extra four categories during the mapper construction process.
We will consider providing an image/video inference demo.
'sem_seg_head.predictor.class_embed.zs_weight' to the model due to incompatible shapes:( 512, 1204) in check point but (512, 1197)