Open gapjialin opened 7 months ago
It sounds like you might need to resize your images -- take a look at "Increasing the input image resolution" improvement point in the LLaVA NeXT blog post: https://llava-vl.github.io/blog/2024-01-30-llava-next/
It sounds like you might need to resize your images -- take a look at "Increasing the input image resolution" improvement point in the LLaVA NeXT blog post: https://llava-vl.github.io/blog/2024-01-30-llava-next/
Thank you! I noticed that during fine-tuning, there is a description of the coordinates in the image in the dataset. Does this coordinate correspond to the image before or after compression resolution? I really want to know this question.
It sounds like you might need to resize your images -- take a look at "Increasing the input image resolution" improvement point in the LLaVA NeXT blog post: https://llava-vl.github.io/blog/2024-01-30-llava-next/
Thank you! I noticed that during fine-tuning, there is a description of the coordinates in the image in the dataset. Does this coordinate correspond to the image before or after compression resolution? I really want to know this question.
I think it is no different because you using range 0~1, just make true you using image_radio = spatial unpad.
It sounds like you might need to resize your images -- take a look at "Increasing the input image resolution" improvement point in the LLaVA NeXT blog post: https://llava-vl.github.io/blog/2024-01-30-llava-next/
Thank you! I noticed that during fine-tuning, there is a description of the coordinates in the image in the dataset. Does this coordinate correspond to the image before or after compression resolution? I really want to know this question.
I want to ask you what image size or resolution did you end up doing and did it work.
Describe the issue
Hello, there is currently a self built dataset for 80K object detection, which is used to detect the position of objects in the image. The image size is 1920x1080. When I use this data for fine-tuning, it is difficult to detect the position of objects in the image. What is the problem? The machine I am using is 8xA40, and Lora is used for fine-tuning. The other training parameters remain unchanged. My dataset example is as follows:
human: Verify if there is a presence of people in the image. gpt: There are 1 people. human: Pinpoint and describe the exact spots where each person can be found in this picture. gpt: person 1's bounding box coordinate of the region is [0.32, 0.65, 0.34, 0.75].