model mismatch and inference demo

haochenheheda / LVVIS

Large-Vocabulary Video Instance Segmentation dataset

GNU General Public License v3.0

76 stars 1 forks source link

Closed fujianhai closed 1 year ago

fujianhai commented 1 year ago

'sem_seg_head.predictor.class_embed.zs_weight' to the model due to incompatible shapes:( 512, 1204) in check point but (512, 1197)

cilinyan commented 1 year ago

Our released dataset contains 1203 categories, but we are only using the first 1196 categories to train the model. A suitable approach is to filter out the extra four categories during the mapper construction process.
We will consider providing an image/video inference demo.