move-coop / parsons

A python library of connectors for the progressive community.
https://www.parsonsproject.org/
Other
260 stars 132 forks source link

GoogleCloudStorage - Handle zip / gzip files flexibly #937

Closed IanRFerguson closed 10 months ago

IanRFerguson commented 10 months ago

The current gcs.unzip_blob() implementation only handles gzip files, we're updating to handle other forms of compression

IanRFerguson commented 10 months ago

Ok one issue to address @Jason94 ... for "really big files" (e.g., the New York Catalist voterfile) I'm getting a timeout trying to stream the data into GCS (requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out')))

Do you think it's worth trying to handle this (either by checking the size of the file or with a try / catch), or should we just stick with the local file -> blob method that exists in the current function?

IanRFerguson commented 10 months ago

Ok one issue to address @Jason94 ... for "really big files" (e.g., the New York Catalist voterfile) I'm getting a timeout trying to stream the data into GCS (requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out')))

Do you think it's worth trying to handle this (either by checking the size of the file or with a try / catch), or should we just stick with the local file -> blob method that exists in the current function?

There IS a timeout param in the Google API, maybe I'll just mess with that