googleapis / python-storage

Apache License 2.0
437 stars 151 forks source link

OSError occurred while downloading files using transfer_manager.download_many_to_path #1332

Closed hrishijainak closed 2 weeks ago

hrishijainak commented 1 month ago

HI I got following error in logs when i used transfer_manager.download_many_to_path function from google.cloud.storage python library. Input for this is list of files with some prefix in a bucket which i am trying to download in a bulk into gcp vm directory. I have been using this library for 2 -3 months and code runs for more than 1000s of times a day . I have faced this issue first time and I am unable to reproduce it. This is production level implementation, so I cant risk it happening consistently. I have been alternatively using gs util functionality without parallel downloads as well ,but it did not produce this kind of error . For past 2-3 months ,i have started using this library to download files from bucket.

Do I need to upgrade library or is there any other parameter am i missing which will resolve this ? Or is it a bug ?

File "/home/test.py", line 84, in download_bucket_with_transfer_manager results = transfer_manager.download_many_to_path( File "/home.local/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 99, in convert_threads_or_raise return func(*args, *kwargs) File "/home.local/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 764, in download_many_to_path return download_many( File "/home/.local/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 99, in convert_threads_or_raise return func(args, **kwargs) File "/home/.local/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 394, in download_many concurrent.futures.wait( File "/usr/lib/python3.9/concurrent/futures/_base.py", line 637, in exit self.shutdown(wait=True) File "/usr/lib/python3.9/concurrent/futures/process.py", line 773, in shutdown self._result_queue.close() File "/usr/lib/python3.9/multiprocessing/queues.py", line 349, in close self._reader.close() File "/usr/lib/python3.9/multiprocessing/connection.py", line 177, in close self._close() File "/usr/lib/python3.9/multiprocessing/connection.py", line 361, in _close _close(self._handle) OSError: [Errno 9] Bad file descriptor

Environment details

Code example

from google.cloud.storage import Client, transfer_manager

blob_names = [list of blobs] destination_directory ='local_directory_path' bucket = 'some_bucket' blob_name_prefix ='file1.csv' workers = 4

results = transfer_manager.download_many_to_path( bucket, blob_names, destination_directory=destination_directory, blob_name_prefix=blob_name_prefix, max_workers=workers, create_directories=True )

Thanks!

andrewsg commented 1 month ago

Hello, thanks for your bug report. This is the first time we've seen an issue with this. Is it possible that this error output occurred while the computer or virtual machine it was running on was shutting down for an unrelated reason?

hrishijainak commented 1 month ago

the VM was never shut down..As a production machine , its continuously on and performing same operation on different data sets .,so code is working without any issue when we got this error for other data and in same bucket

andrewsg commented 1 month ago

Understood. How frequently does it happen? Are there any other strange behaviors associated with it? Is the machine under a remarkable amount of load or memory pressure when it occurs?

hrishijainak commented 1 month ago

Hi.sorry for the delayed response. The VM was not under stress as its memory and cpu usage was less than 50% . We have not observed major strange kinds of behaviour but sometimes we have also observed downloading gsutil call failed , like following error,so we switched to this library as this is also fast download operation.This code also is same thing we want to achieve . ['gsutil', '-o', 'GSUtil:parallel_composite_upload_threshold=150M', 'cp', 'gs://bucket_name/blob.pickle', 'tmp/blob_1234.pickle']' returned non-zero exit status 1."

andrewsg commented 1 month ago

Has this particular error with the client library happened more than once? Do you have any clues as to any other changes in the environment that may help with a repro?

andrewsg commented 2 weeks ago

Can't repro without more info, closing for now.