Hello author, I used the image caption model you provided, the accuracy is good, but the inference is slow. I saw that there is a transformer-based nlpconnect/vit-gpt2-image-captioning model that computes very fast, I would like to ask which model is more accurate? No offense, thanks.
Hello author, I used the image caption model you provided, the accuracy is good, but the inference is slow. I saw that there is a transformer-based nlpconnect/vit-gpt2-image-captioning model that computes very fast, I would like to ask which model is more accurate? No offense, thanks.