unytics / bigfunctions

Supercharge BigQuery with BigFunctions
https://unytics.io/bigfunctions/
MIT License
583 stars 54 forks source link

[new]: `list_bucket_files( bucket_url [, encrypted_token] )` #163

Open AntoineGiraud opened 2 weeks ago

AntoineGiraud commented 2 weeks ago

Check the idea has not already been suggested

Edit the title above with self-explanatory function name and argument names

BigFunction Description as it would appear in the documentation

list_bucket_files returns the list of files in a bucket with some metadata (name, size, last_modified ...)

at first, we'd incorporate GCS (storage_list_files)

Then we'll add S3, azure ...

same function in snowflake : list

Examples of (arguments, expected output) as they would appear in the documentation

select bigfunctions.eu.list_bucket_files( 'xx:/xxxx/xxxx' [, encrypted_token] )

cf. ❄️ Snowflake's list @my_stage image

AntoineGiraud commented 2 weeks ago

Code pour GCS :

from google.cloud import storage

def list_blobs(bucket_name):
    """Lists all the blobs in the bucket."""
    # bucket_name = "your-bucket-name"

    storage_client = storage.Client()

    # Note: Client.list_blobs requires at least package version 1.17.0.
    blobs = storage_client.list_blobs(bucket_name)

    # Note: The call returns a response only when the iterator is consumed.
    for blob in blobs:
        print(blob.name)
AntoineGiraud commented 2 weeks ago

Code pour S3 (cf. boto3)

import boto3
session = boto3.Session(
    aws_access_key_id='<your_access_key_id>',
    aws_secret_access_key='<your_secret_access_key>'
)

s3 = session.resource('s3')

my_bucket = s3.Bucket('stackvidhya')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)