Open Carol-lyh opened 1 year ago
Sorry for the confusion. For some reason when we developed the model, we save all files in OCR-VQA as .jpg
, including some of the files that you may have downloaded as '.gif`.
For now, you may create a new folder and save all files as .gif
, and one user found changing the extension directly also works: https://github.com/haotian-liu/LLaVA/issues/593#issuecomment-1766215738
Sorry for the confusion. For some reason when we developed the model, we save all files in OCR-VQA as
.jpg
, including some of the files that you may have downloaded as '.gif`.For now, you may create a new folder and save all files as
.gif
, and one user found changing the extension directly also works: #593 (comment)
Thank you for replying me! Excuse me, should I change the extension in json or directly change the image files' extension from '.gif' to '.jpg'? Also, another question, about 170 images in Visual Genome dataset I downloaded, specifically, the VG_100K_2, is empty. So I CAN'T open it. I DON'T know where the problem lying?
How exactly did you download the dataset? And did you try several times? I once met a download error, but it was just a temporary HTTP connection issue. Training giant model always forces us to download an enormous amount of files... Maybe you should also try downloading manually via the browser to see where the problem lies.
Sorry for the confusion. For some reason when we developed the model, we save all files in OCR-VQA as
.jpg
, including some of the files that you may have downloaded as '.gif. For now, you may create a new folder and save all files as
.gif`, and one user found changing the extension directly also works: #593 (comment)Thank you for replying me! Excuse me, should I change the extension in json or directly change the image files' extension from '.gif' to '.jpg'? Also, another question, about 170 images in Visual Genome dataset I downloaded, specifically, the VG_100K_2, is empty. So I CAN'T open it. I DON'T know where the problem lying?
Yes, I have the same question. VG_100K_2
contains 170 empty images. I have searched several of these in llava_v1_5_mix665k.json
and they are not in it, so it should not cause problems.
Describe the issue
I downloaded the several datasets for finetune and run the corresponding finetune.sh in order to reproduce it. It's OK at the beginning of finetune. BUT when training, it raises:
0%| | 22/5198 [02:59<10:21:05, 7.20s/it]
{'loss': 1.1543, 'learning_rate': 2.8205128205128207e-06, 'epoch': 0.0}
0%| | 22/5198 [02:59<10:21:05, 7.20s/it] Traceback (most recent call last): blabla ... FileNotFoundError: [Errno 2] No such file or directory: '/xxx/llava_finetune_data/ocr_vqa/images/782118577.jpg'
BUT actually it's download perfectly with the total number of images matches the overall num, which is 207572 for ocr_vqa dataset, so I DON'T know where is the error?