wkentaro / gdown

Google Drive Public File Downloader when Curl/Wget Fails
MIT License
4.22k stars 348 forks source link

Cannot download docx files from google drive link #253

Closed saireddy12 closed 1 year ago

saireddy12 commented 1 year ago

Provide environment information

Python 3.10.10 gdown 4.6.4

What OS are you using?

macOs 12.6.3

Describe the Bug

i used the gdrive.download option using python

url = "https://docs.google.com/document/d/1HOzb__2DdfS1fMDn9_EpoItgeamwRGMD/edit?usp=sharing&ouid=114613300604928585962&rtpof=true&sd=true"
output = '/content/'
gdown.download(url=url, output=output, quiet=False,fuzzy=True)

This is what i am getting as output , its not downloading the file , its just storing the html and the file name is also not correct

_From: https://docs.google.com/document/d/1HOzb__2DdfS1fMDn9_EpoItgeamwRGMD/edit?usp=sharing&ouid=114613300604928585962&rtpof=true&sd=true To: /content/edit?usp=sharing&ouid=114613300604928585962&rtpof=true&sd=true_

i see we can download the file properly using file id

url = "https://docs.google.com/document/d/1HOzb__2DdfS1fMDn9_EpoItgeamwRGMD/edit?usp=sharing&ouid=114613300604928585962&rtpof=true&sd=true"
id = "1HOzb__2DdfS1fMDn9_EpoItgeamwRGMD"
output = '/content/'
gdown.download(id=id, output=output, quiet=False,fuzzy=True)

this works fine , so we need to extract the id form the url and use a different function call in case of docs.google.com file link i see you added code to handle docs.google.com link , but i am not sure why its not working ,anyway

i created a function to extract id from the file(only file) link

def extract_id_from_drive_link( file_link ):
    ### extracts id from a given google drive file link
    #sample file: https://docs.google.com/document/d/1HOzb__2DdfS1fMDn9_EpoItgeamwRGMD/edit?usp=sharing&ouid=114613300604928585962&rtpof=true&sd=true"
    id = -1
    try:
        parsed = urlparse(file_link)
        #check if its a docs link , if yes , return the id
        if parsed.hostname in ["docs.google.com","drive.google.com"]:
            link_path = parsed.path #/document/d/1HOzb__2DdfS1fMDn9_EpoItgeamwRGMD/edit
            id = link_path.split('/')[-2]
        #else return error message and -1
        else:
            print(f"please check the file link , only google drive file link is supported ")
            pass
    except Exception as er:
        print(f"error occured while trying to extract id from drive link , error is: {er}")

    return id

you can pass the file link to above function and call download using id and it works for both docs or drive links

you can do something like this

url = "https://docs.google.com/document/d/1HOzb__2DdfS1fMDn9_EpoItgeamwRGMD/edit?usp=sharing&ouid=114613300604928585962&rtpof=true&sd=true"
id = extract_id_from_drive_link( file_link = url )
gdown.download(id=id, output=output, quiet=False,fuzzy=True)

fyi , i am using gdown 4.6.4

Expected Behavior

No response

To Reproduce

No response

wkentaro commented 1 year ago

Closed via https://github.com/wkentaro/gdown/releases/tag/v4.7.1