haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.28k stars 2.12k forks source link

[Usage] Faster OCR VQA download. #931

Open John-Ge opened 9 months ago

John-Ge commented 9 months ago

Describe the issue

Issue: I find download ocr vqa images are slow and I implement a multi-thread download file.

Command:

import concurrent.futures
def download_image(k):
    ext = os.path.splitext(data[k]['imageURL'])[1]
    outputFile = 'images/%s%s' % (k, ext)

    # Only download the image if it doesn't exist
    if not os.path.exists(outputFile):
        ureq.urlretrieve(data[k]['imageURL'], outputFile)

if download == 1:
    # Create the directory if it doesn't exist
    if not os.path.exists('./images'):
        os.mkdir('./images')

    # Create a thread pool and download the images in parallel
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(download_image, data.keys())

This maybe useful.

xulinui commented 8 months ago

https://blog.csdn.net/weixin_46460463/article/details/135537718?spm=1001.2014.3001.5501

master-chou commented 4 months ago

thanks !!!!

cooleel commented 4 months ago

https://blog.csdn.net/weixin_46460463/article/details/135537718?spm=1001.2014.3001.5501

Thanks for posting the method, but some of the images that I downloaded are damaged. Did anyone face the same issue?