jiaosiyu1999 / MAFT

46 stars 2 forks source link

When will the code be released? #1

Open gasharper opened 11 months ago

gasharper commented 11 months ago

thanks for your impressive work! I wondering when are the code and pre-trained model will be released. Thank you!

jiaosiyu1999 commented 11 months ago

Thank you for your interest. The official code will be released within a month. Here is an unrefined version of the code for your reference: code, model

Train

  1. step1 train an existing "froen CLIP" network, e.g., FreeSeg:

    python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_zss.yaml --num-gpus 4 
  2. step2 Fine-tune CLIP Image Encoder with MAFT:

    python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_ft.yaml --num-gpus 4

    Test Only

    python train_net.py --config-file configs/coco-stuff-164k-156/eval.yaml --num-gpus 8 --eval-only  MODEL.WEIGHTS path/to/your/weights
GoldfishFive commented 10 months ago

I wonder where the IP-CLIP Encoder is, there is only one myclip_model in ft.py, and its structure is "ViT-B/16"like normal CLIP. Thank you!

jiaosiyu1999 commented 10 months ago

A new version of the code has been released. The implementation of IPCLIP is at here.

gasharper commented 10 months ago

Thank you, that is very great work!

gasharper commented 10 months ago

@jiaosiyu1999 Hi, it seems that the code is based on detectron2 and the third-party operator (MSDeformAttnFunction), it is not friendly for beginners to deploy. Is it possible to provide a single image inference demo only based on some necessary libraries (like Pytorch, OpenCV, etc.)? It may greatly help us to get the main ideas of the whole work. Thanks for your early reply.