Closed victorup closed 1 year ago
can you first try the image shared in the readme and see if there is still such [unused0]? In the meanwhile, can you also share the image you are trying to test? I can test it on my side for further investigation.
The caption of the image in the readme can be generated very well. Sorry, I'm not convenient to share the images. But I further checked my testset, and find that the "[ unused0 ]" tokens usually appear when there is any strange object. So I think the token may mean an unknown word. I will clean my data for better results. Thank you!
I've got the same issue and i can paste that image with caption
the output is : [unused0] sticking his tongue out
Thanks for reporting this. We will investigate more on this issue. I tried the base-version, which can give reasonable results.
In CC12M, the person's name is replaced as \<PERSON>, while here [unused0] is used to replace such special token. CC12M is used in LARGE model, but not in BASE model. [unused0] can be re-interpreted as a person
Original CC12M data examples:
['The source of Anime quotes & Manga quotes : Photo <PERSON>, Manga Quotes, '
'Art Images, Fan Art, Thoughts, Think, Anime, Crying, Random\n',
'<PERSON> with Bindi, <PERSON> and <PERSON> before he left the zoo, and lost '
"contact with his late son's family. Photo: Getty Images\n",
'The wedding of <PERSON> and Ashleigh McDonald Photography 11\n',
'An artist rendering shows Supreme Court Justices from left, <PERSON>, '
'<PERSON>, <PERSON>, <PERSON>, Chief Justice <PERSON>, <PERSON>, <PERSON>, '
'<PERSON>, and <PERSON> inside Supreme Court in W\n',
"'(Day 10) <PERSON>'s team clinches the BAT Grad Academy's 'Best Place to "
"Work For' award at the business simulation!'' Do you aspire to work in top "
'global company? Look no further. BAT is well known for being one of the '
"world's best companies to work at, certified as a Top Employer around the "
'world. BAT Malaysia has also won several HR excellence awards, leading in '
'several categories including Employee Engagement and Best Companies to Work '
'for in Asia. This is testament to the initiatives and efforts invested into '
"their people agenda.'\n"]
cc12M paper
Thank you for the answer. Results were same for me too base model had no issues but large model gave me these results.
anyway to solve it?
one way is to retrain the large model by not using such special characters. I will try to do this.
I have noticed another interesting thing with the generation results: It is giving same output "digital art selected for the #" for all of the following images: All of these images were generated by using stable diffusion.
@prashantkandel12 @DDuan-zw I removed the offensive captions in cc12m dataset and retrained the large-sized model. Please check the details here.
For the image of Einstein, the model of GIT_LARGE_R_COCO will predict: ‘a black and white photo of a man sticking his tongue out.’.
I have noticed another interesting thing with the generation results: It is giving same output "digital art selected for the #" for all of the following images: All of these images were generated by using stable diffusion.
@prashantkandel12 I tried the model of GIT_LARGE_COCO, the output is as follows. I assume you were using the pretrained model. For a demo purpose, it is recommended to use the fine-tuned ones as the pretraining dataset is quite noisy.
3d rendering of a woman wearing a virtual reality headset.
Hi,
When I use GIT_LARGE_COCO to generate captions, the results show many "[ unused0 ]" tokens in the captions.
For example:
So what is "[ unused0 ]"? Does it mean an unknown word? Why it generates many "[ unused0 ]" tokens?
How could I avoid these situations?
Thanks!