haochenheheda / LVVIS

Large-Vocabulary Video Instance Segmentation dataset
GNU General Public License v3.0
76 stars 1 forks source link

model mismatch and inference demo #8

Closed fujianhai closed 1 year ago

fujianhai commented 1 year ago
  1. model parameter mismatch,: using model ov2seg resnet50 , https://drive.google.com/file/d/1YqL0PDmEhLayqaxTab9ag_ZX26ZS-zB4/view?usp=drive_link

'sem_seg_head.predictor.class_embed.zs_weight' to the model due to incompatible shapes:( 512, 1204) in check point but (512, 1197)

  1. can you provide a image/video inference demo ?
cilinyan commented 1 year ago
  1. Our released dataset contains 1203 categories, but we are only using the first 1196 categories to train the model. A suitable approach is to filter out the extra four categories during the mapper construction process.
  2. We will consider providing an image/video inference demo.