hkchengrex / Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
https://hkchengrex.com/Cutie/
MIT License
691 stars 69 forks source link

How to train Cuite without an object transformer? #73

Closed HoyLouis closed 3 months ago

HoyLouis commented 4 months ago

According to your paper,L = 0 is equivalent to not having an object transformer and this model achieves 65.2 J &F。I couldn't get a successful training by changing cfg.object_transformer.num_blocks to 0. How can I get a model without an object transformer.

Thanks for your replay!

hkchengrex commented 4 months ago

I couldn't get a successful training by changing cfg.object_transformer.num_blocks to 0

What exactly do you mean? If there are run-time errors, you would have to comment out a few lines of code related to the object transformer. We did not specifically write code to skip those lines when the number of blocks is zero. It should be fairly easy.

HoyLouis commented 4 months ago

Thanks for your reply!

With Cuite model,I achieve several good results in my task,better than any other VOS I tried. Thanks for your contribution to the community. Because of the good results on my datasets, I tried to deploy the model on my platform, but due to the limitations of hardware,object_transformer cannot run efficiently. So I want to get a model without an object transformer.

I run the small model on my own data,just changing cfg.object_transformer.num_blocks to 0.. The model config is as following:

pixel_mean: [0.485, 0.456, 0.406] pixel_std: [0.229, 0.224, 0.225]

pixel_dim: 256 key_dim: 64 value_dim: 256 sensory_dim: 256 embed_dim: 256

pixel_encoder: type: resnet18 ms_dims: [256, 128, 64]

mask_encoder: type: resnet18 final_dim: 256

pixel_pe_scale: 32 pixel_pe_temperature: 128

object_transformer: embed_dim: ${model.embed_dim} ff_dim: 2048 num_heads: 8 num_blocks: 0 num_queries: 16 read_from_pixel: input_norm: False input_add_pe: False add_pe_to_qkv: [True, True, False] read_from_past: add_pe_to_qkv: [True, True, False] read_from_memory: add_pe_to_qkv: [True, True, False] read_from_query: add_pe_to_qkv: [True, True, False] output_norm: False query_self_attention: add_pe_to_qkv: [True, True, False] pixel_self_attention: add_pe_to_qkv: [True, True, False]

object_summarizer: embed_dim: ${model.object_transformer.embed_dim} num_summaries: ${model.object_transformer.num_queries} add_pe: True

aux_loss: sensory: enabled: True weight: 0.01 query: enabled: True weight: 0.01

mask_decoder: up_dims: [256, 128, 128]

During the trainning, I received a run-time error as following:

202406-c96308a4b188af863935ce6714a0caae

If I want to get a model without an object transformer. What else should I do?

Thanks for your patient guidance!

hkchengrex commented 4 months ago

I pushed e869fab. It should let you do this now.

hkchengrex commented 3 months ago

Please feel free to re-open if it still does not work.