Closed simonw closed 2 years ago
May need to do two API requests here - one to figure out the filename, extension and content type, the other to retrieve the contents.
I'll use an httpx
streaming download: https://www.python-httpx.org/quickstart/#streaming-responses
>>> with httpx.stream("GET", "https://www.example.com") as r:
... for data in r.iter_bytes():
... print(data)
If you provide a -o
option I'll use that as the filename without running a separate request.
Here's the API call to get the filename:
% google-drive-to-sqlite get 'https://www.googleapis.com/drive/v3/files/0B32uDVNZfiEKOUliZ3NMNC0yV3RiYkZabjhlSHhXZ2UxOEZn'
{
"kind": "drive#file",
"id": "0B32uDVNZfiEKOUliZ3NMNC0yV3RiYkZabjhlSHhXZ2UxOEZn",
"name": "Items recovered #3.jpg",
"mimeType": "image/jpeg",
"resourceKey": "0-ZZIGVxj4zbk5mf3HltMvsg"
}
The problem with using the filename from that API is that the user will run the command and then have no idea where the file was saved.
wget simonwillison.net
solves that by showing you in the output to stderr
:
...
2022-02-16 19:34:57 (44.7 MB/s) - ‘index.html’ saved [93300]
Run that a second time:
2022-02-16 19:35:30 (3.25 MB/s) - ‘index.html.1’ saved [93300]
Which seems right - it's definitely bad to over-write an existing file with the same name.
I'm going to support multiple file IDs.
If you use the -o
option you can only use a single ID.
-o -
will write to standard output.
Design for this feature then:
# Look up FILE_ID's name, download it to that filename on disk
google-drive-to-sqlite download FILE_ID
# If that filename already exists, adds .1 or .2 etc to the end
# This shows information to stderr, unless you pass `-s` or `--silent`
# Control the output filename (and save one API request)
google-drive-to-sqlite download FILE_ID -o out.jpg
# This will over-write out.jpg if it already exists
# Download multiple files
google-drive-to-sqlite download FILE1 FILE2 FILE3
# This saves them with their filenames fetched by the other API request
Maybe it would be good to have a --use-file-id-as-filename
option (but something a bit snappier than that)? I'd still need to do an API call to figure out the file extension to use... or that option could skip file extensions entirely.
Excellent news: the HTTP headers that come back with the ?alt=media
response include the content-type, which I can use to derive the correct file extension!
{
"x-guploader-uploadid": "ADPycdshfScGL5dQyKL_NvivgOFaC4gGfekqs5Gq0sAIVZQQUdU0hLheFGAUjFMs6Bf-P7GMLZxIVVZnSVWPVg8J6ic",
"content-type": "image/jpeg",
"content-disposition": "attachment",
"vary": "Origin, X-Origin",
"expires": "Thu, 17 Feb 2022 03:58:21 GMT",
"date": "Thu, 17 Feb 2022 03:58:21 GMT",
"cache-control": "private, max-age=0, must-revalidate",
"content-length": "45237",
"x-goog-hash": "crc32c=Lpascw==",
"server": "UploadServer",
"alt-svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\""
}
For a PDF it's: content-type": "application/pdf"
I'm going to split on /
and use the last token as the file extension.
Since it returns content-length
too I could even have a progress bar.
Rough prototype with working progress bar (for files that return content-length - a video did, two PDFs did not):
@cli.command()
@click.argument("file_id")
@click.option(
"-a",
"--auth",
type=click.Path(file_okay=True, dir_okay=False, allow_dash=True),
default="auth.json",
help="Path to auth.json token file",
)
def download(file_id, auth):
access_token = load_token(auth)
with httpx.stream(
"GET",
"https://www.googleapis.com/drive/v3/files/{}?alt=media".format(file_id),
headers={"Authorization": "Bearer {}".format(access_token)},
) as r:
click.echo(json.dumps(dict(r.headers), indent=4), err=True)
click.echo("Writing to FILE.bin")
fp = open("FILE.bin", "wb")
length = int(r.headers.get("content-length", "0"))
if length:
with click.progressbar(
length=int(r.headers["content-length"]), label="FILENAME"
) as bar:
for data in r.iter_bytes():
fp.write(data)
bar.update(len(data))
else:
for data in r.iter_bytes():
fp.write(data)
Actually I'm not going to fetch the filename at all unless the user asks me to. The first version of this will use OAENTUHAOTNUHAN.gif
- file ID plus extension.
Originally posted by @simonw in https://github.com/simonw/google-drive-to-sqlite/issues/1#issuecomment-1041026849