googleapis / python-storage

Apache License 2.0
438 stars 151 forks source link

Allow tracking upload progress. #27

Open bachirelkhoury opened 5 years ago

bachirelkhoury commented 5 years ago

This is related to googleapis/google-cloud-python#1830 reopening here as this seems to have been closed many years ago.

We are please looking for this feature as we need to monitor large files being uploaded to Google Storage buckets. I am surprised not many people are after this essential feature, which makes me feel we haven't done our research properly or that the solution is very obvious or trivial.

Can someone please share an example of how we could track progress during upload?

Update: Should we be looking at google-resumable-media? will try that out and report back.


tseaver commented 5 years ago

@frankyn Please help prioritize this feature. See the discussion in googleapis/google-cloud-python#1830 / googleapis/google-cloud-python#1077 for the tradeoffs involved.

pdex commented 4 years ago

Here's a workaround using tqdm.wrapattr:

import os
from import storage

def upload_blob(client, bucket_name, source, dest, content_type=None):
  bucket = client.bucket(bucket_name)
  blob = bucket.blob(dest)
  with open(source, "rb") as in_file:
    total_bytes = os.fstat(in_file.fileno()).st_size
    with tqdm.wrapattr(in_file, "read", total=total_bytes, miniters=1, desc="upload to %s" % bucket_name) as file_obj:
      return blob

if __name__ == "__main__":
  upload_blob(storage.Client(), "bucket", "/etc/motd", "/path/to/blob.txt", "text/plain")
zLupa commented 3 years ago

One year since this has opened, any updates?

Shreeyak commented 3 years ago

This is an essential feature for large file uploads/downloads. I resorted to using gsutil via subprocess call just for the download progress bar.

kamal94 commented 3 years ago

To add to @pdex 's submission: I am generating upload URLs via blob.generate_signed_urls and passing it to my application's client to upload a user-generated file. Here is what worked for me:

object_address = str(uuid.uuid4())
upload_url, upload_method = get_upload_url(object_address) # fetches signed upload URL 
size = os.path.getsize(filename)
with open(filename, "rb") as in_file:
    total_bytes = os.fstat(in_file.fileno()).st_size
    with tqdm.wrapattr(
        desc="Uploading to my bucket",
    ) as file_obj:
        response = requests.request(
            headers={"Content-Type": "application/octet-stream"},

return object_address, size
Mohab25 commented 2 years ago

@frankyn any updates on this, been going from 2019.

frankyn commented 2 years ago

Thanks for the ping. @andrewsg this has +17 upvotes could you please take a look when you have a moment?

andrewsg commented 2 years ago

We have some long-term plans around async code and transport mechanisms that may make fully integrated support for a progress meter feasible in the future, but until then, there are two main options: chunk media operations and report status in between chunks, or use a file object wrapper that tracks how much data is written or read.

As it happens, large uploads are already chunked by default using the resumable upload API. However, upload functions in the Python client library are agnostic as to the upload strategy and so we can't easily add callback functionality to upload functions in a way that will work for all uploads - they would only work for resumable uploads, and communicating that to the user would be awkward. At any rate, they will only report completed chunks, so they're inferior to the file object wrapper method.

I'll look into implementing a good first-party turnkey solution for the file object wrapper strategy. Until then, I recommend use of the tqdm attribute wrapper as show in the comments above.

nom commented 6 months ago

+1 to this