talkiq / gcloud-aio

(Asyncio OR Threadsafe) Google Cloud Client Library for Python
https://talkiq.github.io/gcloud-aio
265 stars 90 forks source link

auto_decompress not supported as keyword argument for aiohttp<3.9 #716

Open sebastianmika opened 3 months ago

sebastianmika commented 3 months ago

With the latest release of gcloud-aio-auth/storage (5.3.0) the following breaks when using aiohttp<3.9, which is allowed since the minimal required version is currently aiohttp>=3.3.0.

Potential fixes:

Code to reproduce with aiohttp==3.8.6:

from aiohttp import ClientSession
from gcloud.aio.storage import Storage

async with ClientSession() as session:
    async with Storage(session=session) as client:
         await client.download("test", "test")

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 6
      4 async with ClientSession() as session:
      5     async with Storage(session=session) as client:
----> 6          await client.download("test", "test")

File ~/venvs/lib/python3.11/site-packages/gcloud/aio/storage/storage.py:329, in Storage.download(self, bucket, object_name, headers, timeout, session)
    323 async def download(
    324     self, bucket: str, object_name: str, *,
    325     headers: Optional[Dict[str, Any]] = None,
    326     timeout: int = DEFAULT_TIMEOUT,
    327     session: Optional[Session] = None,
    328 ) -> bytes:
--> 329     return await self._download(
    330         bucket, object_name, headers=headers,
    331         timeout=timeout, params={'alt': 'media'},
    332         session=session,
    333     )

File ~/venvs/lib/python3.11/site-packages/gcloud/aio/storage/storage.py:557, in Storage._download(self, bucket, object_name, params, headers, timeout, session)
    554 headers.update(await self._headers())
    556 s = AioSession(session) if session else self.session
--> 557 response = await s.get(
    558     url, headers=headers, params=params or {},
    559     timeout=timeout,
    560 )
    562 # N.B. the GCS API sometimes returns 'application/octet-stream' when a
    563 # string was uploaded. To avoid potential weirdness, always return a
    564 # bytes object.
    565 try:

File ~/venvs/lib/python3.11/site-packages/gcloud/aio/auth/session.py:214, in AioSession.get(self, url, headers, timeout, params, stream, auto_decompress)
    208 if stream is not None:
    209     log.warning(
    210         'passed unused argument stream=%s to AioSession: '
    211         'this argument is only used by SyncSession',
    212         stream,
    213     )
--> 214 resp = await self.session.get(
    215     url, headers=headers,
    216     timeout=timeout, params=params,
    217     auto_decompress=auto_decompress,
    218 )
    219 await _raise_for_status(resp)
    220 return resp

File ~/venvs/lib/python3.11/site-packages/aiohttp/client.py:922, in ClientSession.get(self, url, allow_redirects, **kwargs)
    917 def get(
    918     self, url: StrOrURL, *, allow_redirects: bool = True, **kwargs: Any
    919 ) -> "_RequestContextManager":
    920     """Perform HTTP GET request."""
    921     return _RequestContextManager(
--> 922         self._request(hdrs.METH_GET, url, allow_redirects=allow_redirects, **kwargs)
    923     )

TypeError: ClientSession._request() got an unexpected keyword argument 'auto_decompress'
fbalboaDialpad commented 3 months ago

Hi @sebastianmika, thanks for bringing this up. Previous to the addition of the auto_decompress param in aiohttp 3.9, there's no way of supporting downloading gzipped files without decompressing them. I consider this to be a valid use case, as you may want to download a gzip file from GCS and serve it from your own server, or save it as-is to disk for later processing or whatever you may want to do with it. Automatically decompressing it just to compress it again makes no sense, so that's why we added the option to fetch it compressed in #714. @TheKevJames thoughts on forcing aiohttp>=3.9?

sebastianmika commented 3 months ago

I am seeing your point and the advantage. My challenge (and I am likely not alone) is that I have plenty of aiohttp<3.9 dependencies (internal and external), why forcing 3.9 would come at quite some cost.

Looking at the recent PR I was wondering whether a very similar result couldn't have been achieved by making the user pass in a ClientSession(auto_decompress=False) session into storage = Storage(session=session) explicitly (i.e. the PR maybe wasn't needed?). The only scenario where that would be annoying is when I need to change this behaviour on a call by call basis. But I would not find that very likely to happen in reality.

fbalboaDialpad commented 3 months ago

Yes, that would be a valid option too. I was trying to achieve it on call-by-call basis as we usually have a single storage object for the whole application, and you could have different use cases within it. Still, I see that we do accept a session parameter on both Storage.download and Blob.download, so maybe there's a possible workaround that's not super painful. I also have to take into account how this will work with the requests.Session object, as the codebase for gcloud-aio/rest modules is shared. I'll take a look at this next week, for the moment I suggest pinning the version to 9.2.0

TheKevJames commented 3 months ago

Ah, good catch on this. I would rather avoid upgrading the aiohttp pin too much, if it's not necessary -- it looks like 3.9 only added the auto_decompress parameter to the request method, but it was supported in the client itself before that as early as 3.3.0 (original commit), so we can avoid needing to bump dependency ranges by doing some patching.

I plan to do the following:

Proposed long-term fixes at #717 and #718.

TheKevJames commented 3 months ago

Looking at the recent PR I was wondering whether a very similar result couldn't have been achieved by making the user pass in a ClientSession(auto_decompress=False) session into storage = Storage(session=session) explicitly (i.e. the PR maybe wasn't needed?).

That would have worked for aio (barring the call-by-call changes you called out), but unfortunately not for rest, since our internal usage of the stream parameter also needs to change based on that setting. I don't think we can fully offload this while mataining feature parity between the async and sync builds of this library, though if anyone can think of a way to do so I'd love to be proven wrong!