Open HarukiNishimura-TRI opened 7 months ago
Hi Hakuri,
Thanks for your interest. About your question, for OWL-ViT, we did skipped these several sentences with the try-except sentence during evaluation. Other methods we evaluated does not have this constraint on input length, so no need for this processing is required for them. I think simply truncating the input string length to 16 might be a better solution for this and we will have a try on this. If you have further questions, please feel free to send me emails.
Best regards, Chi
Hi Chi,
Thank you for clarification. So you omitted those sentences for the inter-scenario case as well?
Regards, Haruki
Hi Chi,
Thank you for clarification. So you omitted those sentences for the inter-scenario case as well?
Regards, Haruki
@HarukiNishimura-TRI Yes, I think so, for owl-vit. I think it would be better for inference on owl-vit to truncate the descriptions to 16 letters and use them for inference.
Dear authors,
Thank you for your work and the release of the d-cube dataset.
I was trying to run a pre-trained OWL-ViT model (e.g. "google/owlvit-base-patch32") on the dataset, and found the following sentences to yield a RuntimeError.
A typical error message is shown at the bottom. It seems that the pre-trained model uses
max_position_embeddings = 16
inOwlViTTextConfig
which is not long enough to accept the descriptions above as inputs. All the models available on Huggingface seem to usemax_position_embeddings = 16
. Did you encounter the same issue when running your experiments for the paper? If so, how did you handle it in the evaluation process?Thanks in advance.