[Usage] No such file or directory when finetuning

Carol-lyh commented 1 year ago

Describe the issue

I downloaded the several datasets for finetune and run the corresponding finetune.sh in order to reproduce it. It's OK at the beginning of finetune. BUT when training, it raises:

0%| | 22/5198 [02:59<10:21:05, 7.20s/it]

{'loss': 1.1543, 'learning_rate': 2.8205128205128207e-06, 'epoch': 0.0}

0%| | 22/5198 [02:59<10:21:05, 7.20s/it] Traceback (most recent call last): blabla ... FileNotFoundError: [Errno 2] No such file or directory: '/xxx/llava_finetune_data/ocr_vqa/images/782118577.jpg'

BUT actually it's download perfectly with the total number of images matches the overall num, which is 207572 for ocr_vqa dataset, so I DON'T know where is the error?

haotian-liu commented 1 year ago

Sorry for the confusion. For some reason when we developed the model, we save all files in OCR-VQA as .jpg, including some of the files that you may have downloaded as '.gif`.

For now, you may create a new folder and save all files as .gif, and one user found changing the extension directly also works: https://github.com/haotian-liu/LLaVA/issues/593#issuecomment-1766215738

Carol-lyh commented 1 year ago

Sorry for the confusion. For some reason when we developed the model, we save all files in OCR-VQA as .jpg, including some of the files that you may have downloaded as '.gif`.

For now, you may create a new folder and save all files as .gif, and one user found changing the extension directly also works: #593 (comment)

Thank you for replying me! Excuse me, should I change the extension in json or directly change the image files' extension from '.gif' to '.jpg'? Also, another question, about 170 images in Visual Genome dataset I downloaded, specifically, the VG_100K_2, is empty. So I CAN'T open it. I DON'T know where the problem lying?

HireTheHero commented 1 year ago

How exactly did you download the dataset? And did you try several times? I once met a download error, but it was just a temporary HTTP connection issue. Training giant model always forces us to download an enormous amount of files... Maybe you should also try downloading manually via the browser to see where the problem lies.

lxysl commented 6 months ago

Sorry for the confusion. For some reason when we developed the model, we save all files in OCR-VQA as .jpg, including some of the files that you may have downloaded as '.gif. For now, you may create a new folder and save all files as.gif`, and one user found changing the extension directly also works: #593 (comment)

Thank you for replying me! Excuse me, should I change the extension in json or directly change the image files' extension from '.gif' to '.jpg'? Also, another question, about 170 images in Visual Genome dataset I downloaded, specifically, the VG_100K_2, is empty. So I CAN'T open it. I DON'T know where the problem lying?

Yes, I have the same question. VG_100K_2 contains 170 empty images. I have searched several of these in llava_v1_5_mix665k.json and they are not in it, so it should not cause problems.

The empty images in VG_100K_2

``` Can't open file: VG_100K_2/2416776.jpg Can't open file: VG_100K_2/2355831.jpg Can't open file: VG_100K_2/2370333.jpg Can't open file: VG_100K_2/2407337.jpg Can't open file: VG_100K_2/2408825.jpg Can't open file: VG_100K_2/2331541.jpg Can't open file: VG_100K_2/2357389.jpg Can't open file: VG_100K_2/2392406.jpg Can't open file: VG_100K_2/2414809.jpg Can't open file: VG_100K_2/2344903.jpg Can't open file: VG_100K_2/2357002.jpg Can't open file: VG_100K_2/2339452.jpg Can't open file: VG_100K_2/2316472.jpg Can't open file: VG_100K_2/2397804.jpg Can't open file: VG_100K_2/2369516.jpg Can't open file: VG_100K_2/2329481.jpg Can't open file: VG_100K_2/2400248.jpg Can't open file: VG_100K_2/2403972.jpg Can't open file: VG_100K_2/2416204.jpg Can't open file: VG_100K_2/2401793.jpg Can't open file: VG_100K_2/2354293.jpg Can't open file: VG_100K_2/2360334.jpg Can't open file: VG_100K_2/2360452.jpg Can't open file: VG_100K_2/2324980.jpg Can't open file: VG_100K_2/2338403.jpg Can't open file: VG_100K_2/2415090.jpg Can't open file: VG_100K_2/2405254.jpg Can't open file: VG_100K_2/2335721.jpg Can't open file: VG_100K_2/2384303.jpg Can't open file: VG_100K_2/2383685.jpg Can't open file: VG_100K_2/2370916.jpg Can't open file: VG_100K_2/2410274.jpg Can't open file: VG_100K_2/2379927.jpg Can't open file: VG_100K_2/2336771.jpg Can't open file: VG_100K_2/2345238.jpg Can't open file: VG_100K_2/2397424.jpg Can't open file: VG_100K_2/2350190.jpg Can't open file: VG_100K_2/2369711.jpg Can't open file: VG_100K_2/2320098.jpg Can't open file: VG_100K_2/2376553.jpg Can't open file: VG_100K_2/2394503.jpg Can't open file: VG_100K_2/2328613.jpg Can't open file: VG_100K_2/2345400.jpg Can't open file: VG_100K_2/2386339.jpg Can't open file: VG_100K_2/2354486.jpg Can't open file: VG_100K_2/2385744.jpg Can't open file: VG_100K_2/2378352.jpg Can't open file: VG_100K_2/2404941.jpg Can't open file: VG_100K_2/2407333.jpg Can't open file: VG_100K_2/2336852.jpg Can't open file: VG_100K_2/2347067.jpg Can't open file: VG_100K_2/2394940.jpg Can't open file: VG_100K_2/2354437.jpg Can't open file: VG_100K_2/2410589.jpg Can't open file: VG_100K_2/2407778.jpg Can't open file: VG_100K_2/2407779.jpg Can't open file: VG_100K_2/2391568.jpg Can't open file: VG_100K_2/2339625.jpg Can't open file: VG_100K_2/2350687.jpg Can't open file: VG_100K_2/2344194.jpg Can't open file: VG_100K_2/2415989.jpg Can't open file: VG_100K_2/2316003.jpg Can't open file: VG_100K_2/2390058.jpg Can't open file: VG_100K_2/2356194.jpg Can't open file: VG_100K_2/2402944.jpg Can't open file: VG_100K_2/2397721.jpg Can't open file: VG_100K_2/2408461.jpg Can't open file: VG_100K_2/2325435.jpg Can't open file: VG_100K_2/2374336.jpg Can't open file: VG_100K_2/2335434.jpg Can't open file: VG_100K_2/2328471.jpg Can't open file: VG_100K_2/2395669.jpg Can't open file: VG_100K_2/2335839.jpg Can't open file: VG_100K_2/2337155.jpg Can't open file: VG_100K_2/2321051.jpg Can't open file: VG_100K_2/2370544.jpg Can't open file: VG_100K_2/2350125.jpg Can't open file: VG_100K_2/2400944.jpg Can't open file: VG_100K_2/2359672.jpg Can't open file: VG_100K_2/2413483.jpg Can't open file: VG_100K_2/2405591.jpg Can't open file: VG_100K_2/2393186.jpg Can't open file: VG_100K_2/2346745.jpg Can't open file: VG_100K_2/2354396.jpg Can't open file: VG_100K_2/2379210.jpg Can't open file: VG_100K_2/2389915.jpg Can't open file: VG_100K_2/2417802.jpg Can't open file: VG_100K_2/2318215.jpg Can't open file: VG_100K_2/2338884.jpg Can't open file: VG_100K_2/2325625.jpg Can't open file: VG_100K_2/2363899.jpg Can't open file: VG_100K_2/2344765.jpg Can't open file: VG_100K_2/2387074.jpg Can't open file: VG_100K_2/2320630.jpg Can't open file: VG_100K_2/2346562.jpg Can't open file: VG_100K_2/2315674.jpg Can't open file: VG_100K_2/2414917.jpg Can't open file: VG_100K_2/2388568.jpg Can't open file: VG_100K_2/2319311.jpg Can't open file: VG_100K_2/2402595.jpg Can't open file: VG_100K_2/2414491.jpg Can't open file: VG_100K_2/2405946.jpg Can't open file: VG_100K_2/2336131.jpg Can't open file: VG_100K_2/2337005.jpg Can't open file: VG_100K_2/2382669.jpg Can't open file: VG_100K_2/2394180.jpg Can't open file: VG_100K_2/2380334.jpg Can't open file: VG_100K_2/2361256.jpg Can't open file: VG_100K_2/2327203.jpg Can't open file: VG_100K_2/2405004.jpg Can't open file: VG_100K_2/2333919.jpg Can't open file: VG_100K_2/2372979.jpg Can't open file: VG_100K_2/2340698.jpg Can't open file: VG_100K_2/2388989.jpg Can't open file: VG_100K_2/2379781.jpg Can't open file: VG_100K_2/2346773.jpg Can't open file: VG_100K_2/2383215.jpg Can't open file: VG_100K_2/2339038.jpg Can't open file: VG_100K_2/2396395.jpg Can't open file: VG_100K_2/2395307.jpg Can't open file: VG_100K_2/2386421.jpg Can't open file: VG_100K_2/2330142.jpg Can't open file: VG_100K_2/2340300.jpg Can't open file: VG_100K_2/2412350.jpg Can't open file: VG_100K_2/2390322.jpg Can't open file: VG_100K_2/2335991.jpg Can't open file: VG_100K_2/2392657.jpg Can't open file: VG_100K_2/2378996.jpg Can't open file: VG_100K_2/2386269.jpg Can't open file: VG_100K_2/2388085.jpg Can't open file: VG_100K_2/2321571.jpg Can't open file: VG_100K_2/2319788.jpg Can't open file: VG_100K_2/2405111.jpg Can't open file: VG_100K_2/2390387.jpg Can't open file: VG_100K_2/2417886.jpg Can't open file: VG_100K_2/2326141.jpg Can't open file: VG_100K_2/2370307.jpg Can't open file: VG_100K_2/2406223.jpg Can't open file: VG_100K_2/2370885.jpg Can't open file: VG_100K_2/2351122.jpg Can't open file: VG_100K_2/2400491.jpg Can't open file: VG_100K_2/2416218.jpg Can't open file: VG_100K_2/2388284.jpg Can't open file: VG_100K_2/2380814.jpg Can't open file: VG_100K_2/2407937.jpg Can't open file: VG_100K_2/2392550.jpg Can't open file: VG_100K_2/2369047.jpg Can't open file: VG_100K_2/2329477.jpg Can't open file: VG_100K_2/2387773.jpg Can't open file: VG_100K_2/2385366.jpg Can't open file: VG_100K_2/2348472.jpg Can't open file: VG_100K_2/2393276.jpg Can't open file: VG_100K_2/2368946.jpg Can't open file: VG_100K_2/2325661.jpg Can't open file: VG_100K_2/2357595.jpg Can't open file: VG_100K_2/2323189.jpg Can't open file: VG_100K_2/2396732.jpg Can't open file: VG_100K_2/2373542.jpg Can't open file: VG_100K_2/2399366.jpg Can't open file: VG_100K_2/2374775.jpg Can't open file: VG_100K_2/2408234.jpg Can't open file: VG_100K_2/2343246.jpg Can't open file: VG_100K_2/2401212.jpg Can't open file: VG_100K_2/2344844.jpg Can't open file: VG_100K_2/2357151.jpg Can't open file: VG_100K_2/2343497.jpg Can't open file: VG_100K_2/2356717.jpg Can't open file: VG_100K_2/2388917.jpg Can't open file: VG_100K_2/2323362.jpg Can't open file: VG_100K_2/2379468.jpg ```

haotian-liu / LLaVA

[Usage] No such file or directory when finetuning #601

Describe the issue