Questions about using COCa to generate captions

ykj467422034 commented 6 months ago

I'm finetuning OpenCLIP on my own csv dataset. Then I output the check_point file, and then use the official code to generate captions. However, the generated captions are always being generated repeatedly. Is there anyone who can help me solve this problem? Finetuning python -m training.main \ --dataset-type "csv" \ --train-data "my-csv/coca_train.csv" \ --warmup 1000 \ --batch-size 32 \ --lr 1e-5 \ --wd 0.1 \ --epochs 1 \ --workers 3 \ --model "coca_ViT-L-14" \ --report-to "wandb" \ --coca-contrastive-loss-weight 0 \ --coca-caption-loss-weight 1 \ --log-every-n-steps 100 Test `import open_clip import torch from PIL import Image

model, _, transform = open_clip.create_model_and_transforms( model_name="coca_ViT-L-14", pretrained="logs/check_point.pth" )

im = Image.open("cat.jpg").convert("RGB") im = transform(im).unsqueeze(0)

with torch.no_grad(), torch.cuda.amp.autocast(): generated = model.generate(im)

print(open_clip.decode(generated[0]).split("")[0].replace("", "")) ` Result As you can see, the captions generated by different pictures are the same.

ykj467422034 commented 6 months ago

@gpucce @Thomas2419 @rwightman @gabrielilharco

gpucce commented 6 months ago

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

ykj467422034 commented 6 months ago

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

gpucce commented 6 months ago

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

So it generates the captions you are showing for the "cat.jpg" file?

ykj467422034 commented 6 months ago

Hi, @ykj467422034 can you share a snippet of the code you are actually using? From what I see the one you share is exactly the one in the readme and I think it should only generate a single caption.

This is what I actually use, because I want to generate a caption, but the key is that they are repeated.

So it generates the captions you are showing for the "cat.jpg" file?

No, I know your meanings. There are 100 images and I generate it picture by picture.

gpucce commented 6 months ago

@ykj467422034 sorry didn´t see your reply, so it repeats the same caption for different images or is generating several captions for one image?

Also did you try and generate a caption for a random tensor?

ykj467422034 commented 6 months ago

@ykj467422034 sorry didn´t see your reply, so it repeats the same caption for different images or is generating several captions for one image?

Also did you try and generate a caption for a random tensor?

The former. repeat captions

gpucce commented 6 months ago

Mmmh not sure, I asked about the random tensor to see if the model generates the same caption also in that case, if that is so, maybe fine-tuning didn´t go well. Do you get a similar behaviour with the pretrained model?

ykj467422034 commented 6 months ago

嗯，不确定，我问了随机张量，看看模型是否也在这种情况下生成相同的标题，如果是这样，也许微调不顺利。在预训练模型中，您是否有类似的行为？ Haven't，I can try this latter. But, thank you very much

Thomas2419 commented 6 months ago

Hello, @ykj467422034 , I haven't check if the most recent update has fixed this issue so this suggestion might not work and in fact it might screw everything up so this is my warning to you, but assuming it hasn't I will refer you to issue #751 The problem was that after coca finetuning the model, all of it's predictions were all repetitions of the same word.

For example in the issue it was "turnpike turnpike turnpike turnpike parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway parkway".

The solution I found to work for me as I described in issue #751 was to git pull the open_clip repository, and then edit my local files in open_clip/src/open_clip/coca_model.py the lines as exactly specified line per line in Pull Request #710 by gpucce, and then ran pip install -e . in the repository's main directory to install it post edits. This completely fixed my problem and made training function as desired for me.

ykj467422034 commented 6 months ago

您好，我还没有检查最近的更新是否解决了这个问题，所以这个建议可能不起作用，实际上它可能会搞砸一切，所以这是我给你的警告，但假设它没有，我会向你推荐问题#751问题是，在古柯微调之后，模型预测都是同一个词的重复。

例如，在问题中，它是“收费公路收费公路收费公路公园大道

正如我在问题 #751 中描述的那样，我发现对我有用的解决方案是 git 拉取 open_clip 存储库，然后在 open_clip/src/open_clip/coca_model.py 中编辑我的本地文件，按照 gpucce 在拉取请求 #710 中每行精确指定的行，然后在存储库的主目录中运行以在编辑后安装它。这完全解决了我的问题，并使训练功能符合我的需要。pip install -e .

I edited it as you say, but the repeat captions still exist

Thomas2419 commented 6 months ago

Are you using the newest branch? I was not so perhaps that is impacting the edit's success.

ykj467422034 commented 6 months ago

Are you using the newest branch? I was not so perhaps that is impacting the edit's success.

Do you mean open_clip repository or modified src files？

Thomas2419 commented 6 months ago

Apologies for my lack of clarity, I mean the open_clip repository, I was using the most up to date version at the time, but it looks like there have been multiple new commits made to it since then. I am currently unable to access mine to check which commit I am using though.

ykj467422034 commented 6 months ago

Apologies for my lack of clarity, I mean the open_clip repository, I was using the most up to date version at the time, but it looks like there have been multiple new commits made to it since then. I am currently unable to access mine to check which commit I am using though.

Fine. Maybe I can try the latest version once more. Thanks

gpucce commented 6 months ago

@ykj467422034 I think that with those changes you would still need to rerun the fine-tuning

ykj467422034 commented 6 months ago

@ykj467422034 I think that with those changes you would still need to rerun the fine-tuning

Sure, I will. Thank you!

mlfoundations / open_clip

Questions about using COCa to generate captions #797