Open soutobias opened 2 months ago
Quick comment, I would recommend one sentence introducing all the methods of access: something like, "Several methods can be employed to access cloud data depending on the server of storage". Otherwise, we are jumping in the 'why' section without much idea where we are going to. This comment applies for any 'why' section of the Github issues.
What:
Adding features to work with images in the cloud and implementing lazy loading involve handling image data stored in cloud storage systems efficiently. This means enabling access to images without needing to download all data at once, optimizing performance, and managing large-scale image datasets effectively.
Why:
Cloud storage offers scalability and remote access to large image datasets but can present challenges in terms of data transfer and local storage. Lazy loading and cloud integration techniques help mitigate these issues by enabling on-demand access to images and reducing unnecessary data transfers. This is crucial for maintaining efficient processing workflows and managing resources effectively.
How:
Cloud Storage Access:
Using
boto3
for AWS S3: Theboto3
library allows interaction with AWS S3, enabling efficient access to images stored in the cloud.Using
google-cloud-storage
for Google Cloud Storage: For Google Cloud Storage, thegoogle-cloud-storage
library provides similar functionality.Lazy Loading with Cloud Storage:
Using
smart_open
for Efficient Streaming: Thesmart_open
library supports streaming data from cloud storage, allowing for lazy loading of images.Using
PIL
for Lazy Loading and Processing: Combiningsmart_open
withPIL
(Pillow) for lazy image processing.Handling Large Image Datasets:
Using
dask
for Parallel Processing:Dask
can be used for parallel processing of images in the cloud.What to expect:
What makes it difficult:
Success Metrics: