wkentaro / gdown

Google Drive Public File Downloader when Curl/Wget Fails
MIT License
4.06k stars 342 forks source link

Should also support cached downloading for gdrive folders #238

Closed rautnikhil closed 1 year ago

rautnikhil commented 1 year ago

I want to skip the already downloaded files from gdrive folder and cached_download is not available for download_folder() it always ends up downloading all the content of given folder. I want to make request that download_folder should support cached download.

I think this can be achieved by calling cached_download function instead of download function in download_folder function on request for non md5 based file verification.

wkentaro commented 1 year ago

@rautnikhil Something like below?

download_folder(
    url,
    ...
    md5={
       "textfile.txt": "xxxxxxxxx",
       "dir1/textfile2.txt": "yyyyyyyyyyyyyy",
    }
)
rautnikhil commented 1 year ago

Hi @wkentaro , I am proposing following solution for non md5 based verification; that is skip the download if file already exist at the download destination by just checking existence of file path like "if osp.exists(path)".
However for md5 based verification something like above can be implemented.

code snippet for further reference

def download_folder( url=None, id=None, output=None, quiet=False, proxy=None, speed=None, use_cookies=True, remaining_ok=False, exist_ok=False #New parameter ) . .

filenames = []
for file_id, file_path in directory_structure:
    if file_id is None:  # folder
        if not osp.exists(file_path):
            os.makedirs(file_path)
        continue

   if exist_ok:
        filename = cached_download(
            url="https://drive.google.com/uc?id=" + file_id,
            output=file_path,
            quiet=quiet,
            postprocess=None
        )
  else
       filename = download(
        "https://drive.google.com/uc?id=" + file_id,
        output=file_path,
        quiet=quiet,
        proxy=proxy,
        speed=speed,
        use_cookies=use_cookies,
    )