Closed urlhearts closed 1 month ago
can you provide more context, e.g. the command line you are using, the result you gets, the image you tested. My guess is that you were using the ptretrained model? the pretrained model was trained on quite noisy datasets, which means the captioning result from such pretrained checkpoint will be quite noisy.
some result is digital art selected for the # , why?