googleapis / google-cloud-python

Google Cloud Client Library for Python
https://googleapis.github.io/google-cloud-python/
Apache License 2.0
4.83k stars 1.53k forks source link

Storage: useful to add method to load object into ByteIO buffer? #6062

Closed dkapitan closed 6 years ago

dkapitan commented 6 years ago

Use-case

We use google cloud storage to store .parquet files as part of our dataprocessing. We often want to load a .parquet file into memory, to be read directly into a pandas Dataframe without downloading it on disk.

The code below does the trick. My question is: would it be useful to include a download_as_buffer method in storage.blob?

from io import BytesIO
from google.oauth2.service_account import Credentials
from google.cloud.storage import Client
import pandas as pd

SERVICE_ACCOUNT = '/some/path/to/service-account.json'
credentials = Credentials.from_service_account_file(SERVICE_ACCOUNT)

bucket = Client(credentials=credentials).bucket('mediquest-closed-data')
f = BytesIO()
bucket.get_blob(blob_name='some_file.parquet').download_to_file(f)
df = pd.read_parquet(f)

Feature request

Add method or modify download_as_string to have option to return the ByteIO buffer rather than getvalue()

    def download_as_string(self, client=None):
        """Download the contents of this blob as a string.

        :type client: :class:`~google.cloud.storage.client.Client` or
                      ``NoneType``
        :param client: Optional. The client to use.  If not passed, falls back
                       to the ``client`` stored on the blob's bucket.

        :rtype: bytes
        :returns: The data stored in this blob.
        :raises: :class:`google.cloud.exceptions.NotFound`
        """
        string_buffer = BytesIO()
        self.download_to_file(string_buffer, client=client)
        return string_buffer.getvalue()

Or am I overlooking a similar method that is already included elsewhere in the API?

Environment

tseaver commented 6 years ago

@dkapitan Blob.download_to_file does what you want (it takes a file object, versus the filename taken by Blob.download_to_filename).

import io
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('my-bucket-name')
blob = bucket.get_blob('my-blob-name')
buffer = io.BytesIO()
blob.download_to_file(buffer)
kondela commented 4 years ago

@dkapitan Blob.download_to_file does what you want (it takes a file object, versus the filename taken by Blob.download_to_filename).

import io
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('my-bucket-name')
blob = bucket.get_blob('my-blob-name')
buffer = io.BytesIO()
blob.download_to_file(buffer)

For anyone working with this later, don't forget to call buffer.seek(0) before reading it.