stevearc / pypicloud

S3-backed pypi server implementation
MIT License
507 stars 141 forks source link

Package upload fails with "AttributeError: can't set attribute" when using pypicloud[gcs] 1.3.9 or later at GAE #317

Closed vanguard737 closed 1 year ago

vanguard737 commented 2 years ago

Hi,

I'm running pypicloud within Google App Engine, using uWSGI and with GCS as the storage backend. On pypicloud 1.3.8 and earlier, this works fine. However, after upgrading to pypicloud 1.3.9 or later, I can no longer upload packages (either via poetry publish --repository <my_repo_name>, or via the web UI). I observed the errors below in /var/log/pypicloud.log.

Given that this behavior manifested in 1.3.9, and that the errors are coming up from smart_open, PR #304 seems possibly relevant. In meantime, workaround was just to roll back to 1.3.8.

Thanks, and let me know if you need any further debugging info - C.J.

ERROR 2022-09-25 19:36:35,783 [pypicloud.views] can't set attribute
Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid/tweens.py", line 41, in excview_tween
    response = handler(request)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid/router.py", line 143, in handle_request
    response = _call_view(
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid/view.py", line 674, in _call_view
    response = view_callable(context, request)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid/config/views.py", line 151, in __call__
    return view(context, request)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid/config/views.py", line 170, in attr_view
    return view(context, request)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid/config/views.py", line 196, in predicate_wrapper
    return view(context, request)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid/viewderivers.py", line 427, in rendered_view
    result = view(context, request)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pyramid_duh/params.py", line 306, in param_twiddler
    return fxn(**scope)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pypicloud/views/simple.py", line 39, in upload
    return request.db.upload(
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pypicloud/cache/base.py", line 151, in upload
    self.storage.upload(new_pkg, data)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/pypicloud/storage/gcs.py", line 207, in upload
    with _open(
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/smart_open/smart_open_lib.py", line 224, in open
    binary = _open_binary_stream(uri, binary_mode, transport_params)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/smart_open/smart_open_lib.py", line 400, in _open_binary_stream
    fobj = submodule.open_uri(uri, mode, transport_params)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/smart_open/gcs.py", line 108, in open_uri
    return open(parsed_uri['bucket_id'], parsed_uri['blob_id'], mode, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/smart_open/gcs.py", line 149, in open
    fileobj = Writer(
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/smart_open/gcs.py", line 425, in __init__
    setattr(self._blob, k, v)
AttributeError: can't set attribute
nivintw commented 2 years ago

hmmm interesting. I did some light triage because this caught my eye.

BLUF: believe the issue is that pypicloud is trying to set properties that do not have setters for gcs using transport_method.

# from https://github.com/stevearc/pypicloud/blob/master/pypicloud/storage/gcs.py starting at line 207
with _open(
            self.get_uri(package),
            "wb",
            compression="disable",
            transport_params={
                "client": self.bucket.client,
                "blob_properties": {
                    "metadata": metadata,
                    "acl": self.object_acl,
                    "storage_class": self.storage_class,
                },
            },
        ) as fp:
            for chunk in stream_file(datastream):
                fp.write(chunk)  # multipart upload

and after reviewing the following:

  1. https://github.com/RaRe-Technologies/smart_open/blob/develop/smart_open/smart_open_lib.py
  2. https://github.com/RaRe-Technologies/smart_open/blob/4c6bc38d6b35aa75e5e9b8041bb80f0f956ac486/smart_open/smart_open_lib.py#L366
  3. https://github.com/RaRe-Technologies/smart_open/blob/develop/smart_open/gcs.py
  4. https://github.com/RaRe-Technologies/smart_open/blob/4c6bc38d6b35aa75e5e9b8041bb80f0f956ac486/smart_open/gcs.py#L395
  5. https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob#google_cloud_storage_blob_Blob_storage_class
  6. https://docs.python.org/3/library/functions.html#setattr
  7. https://stackoverflow.com/questions/27396339/attributeerror-cant-set-attribute

Appears to be straight forward: pypicloud is passing "acl", "metadata", and "storage_class" which is resulting in these (ultimately) being passed to setattr

From https://github.com/RaRe-Technologies/smart_open/blob/4c6bc38d6b35aa75e5e9b8041bb80f0f956ac486/smart_open/gcs.py#L395

if blob_properties:
            for k, v in blob_properties.items():
                setattr(self._blob, k, v)

Reviewing the docs for GCS at https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob#google_cloud_storage_blob_Blob_storage_class What jumps out to me:

  1. acl appears to be a dynamically generated property, and not settable.
    • Essentially, it appears that pypicloud is not using the SDK for gcs properly; there's appears to be no reason to expect setting the object/blob ACL this way should work.
  2. storage_class seems to be same situation as for acl; setting this attribute is not the correct usage of the SDK; pypicloud should directly set this value for the blob via the proper SDK functions rather than passing down to smart_open.
  3. metadata appears to actually support the usage that pypicloud implements currently, so is likely ok; recommend validating with additional testing.

I'm just a passer-by whose attention was caught, and don't have bandwidth or use-case (I don't use the gcs backend at this time) to implement the changes. From my very brief perusal of the SDK docs though, it seems that the needed updates would be relatively minor (and in fact I suspect the correct usage is what's in v1.3.8 pre-smart_open, but haven't checked)

nivintw commented 2 years ago

ok, I was interested to see what 1.3.8 had, and yep it's 90% of the way there.

def upload(self, package, datastream):
        """Upload the package to GCS"""
        metadata = {"name": package.name, "version": package.version}
        metadata.update(package.get_metadata())

        blob = self._get_gcs_blob(package)

        blob.metadata = metadata

        blob.upload_from_file(datastream, predefined_acl=self.object_acl)

        if self.storage_class is not None:
            blob.update_storage_class(self.storage_class)

so recommended changes:

  1. update smart_open.storage.gcs to update the storage class method using the approach from 1.3.8
  2. For setting the object acl, here's where we hit the wall for my familiarity with gcs (I really don't use it) and the end of my mid-week late night curiosity. That said, i'm confident that whatever the solution is, it won't be that hard to implement.

P.S. did some more looking at the acl, and appears like using argname "predefined_acl" instead of "acl" might be all that's needed?

stevearc commented 2 years ago

Thanks @nivintw for the investigation! The root cause you mentioned sounds plausible. It'll unfortunately be a while before I can try to fix this. I'm on the road for the next couple weeks, but I'll get to it when I can.

cc @ddelange

ddelange commented 2 years ago

Thanks for the cc, and sorry that this slipped through. I think this needs upstream PRs.

stevearc commented 2 years ago

Thanks @ddelange for jumping on this so quickly!

ddelange commented 2 years ago

Hi πŸ‘‹

Quick update: we're looking at a major version bump from smart_open from a PR opened 2 days after mine: https://github.com/RaRe-Technologies/smart_open/pull/729

As it changes internals significantly, my original PR at gcs became unnecessary for this issue. Also, instead of updating our mocks to account for new smart_open internals, I switched to fake-gcs-server analogous to switching to azurite in #304.

And that is working like a charm (as we can now actually assert ACL being returned on an uploaded blob, instead of mocking which caused this issue in the first place): I've tested the fix here using unreleased smart_open, CI is green :)

So now we wait for smart_open to release 7.0.0! πŸŽ‰

Heineken - Now we wait