How can we use the CLIP4IDC on custom data

sushizixin / CLIP4IDC

CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)

MIT License

29 stars 2 forks source link

How can we use the CLIP4IDC on custom data #4

Open ooza opened 1 year ago

ooza commented 1 year ago

Hello, Thanks for this amazing work. We are trying to run your project on custom short videos (or set of image pairs) in our lab, but we can't figure out how to do this in the documentation (to get a caption of the images' difference). Any help will be appreciated.

sushizixin commented 1 year ago

Hi, thanks for your attention. We focuses on the single difference between two images. For how to process video frames, maybe it is recommended to refer to video based methods, e.g. clip4clip, swinbert and vid2seq.

ooza commented 1 year ago

Hi, many thanks for your reply! Ok, can you tell me the command to use in order to test CLIP4IDC on two custom images.

sushizixin commented 1 year ago

Hi, please refer to this script and the --init_model is the path where the pre-trained CLEVR model is saved. For testing on the Spot dataset, please change --datatype to spot and set the --init_model to the path where the pre-trained SPOT model is saved.

lzyuan168 commented 1 year ago

Hi @sushizixin Thanks for the great work. May I know how to test the model on images other than CLEVR and SPOT?

sushizixin commented 1 year ago

Hi, we also conducted the experiment on the Image Editing Request dataset. But we use another code base that has not been integrated into this repository. To test the model on the other images, you may need to process the data according to your requirements and train the model in two steps, following the training phase of CLEVR or SPOT.

PPthe2nd commented 1 year ago

Hi, thank you for the interesting paper! I think that it'be really awesome if you would release the IER pre-trained model and possibly(?) a simple script to test it with custom images. As far as I can see, there's no available pre-trained, out-of-the-shelf IDC model out there, and there would be so many potential applications and users (including myself)!