paylogic / pip-accel

pip-accel: Accelerator for pip, the Python package manager
https://pypi.python.org/pypi/pip-accel
MIT License
308 stars 35 forks source link

Allow anonymous connections to the S3 cache backend #45

Closed jzoldak closed 9 years ago

jzoldak commented 9 years ago

S3 has the ability to set ACLs for anonymous access to buckets. This is wired into boto's s3.connection.S3Connection method via the anon parameter.

Modify the S3 cache backend code so that if the user does not have credentials, it will connect as an anonymous user rather than erroring with "No handler was ready to authenticate... Check your credentials" and disabling the S3CacheBackend.

xolox commented 9 years ago

Hi Jesse!

Modify the S3 cache backend code so that if the user does not have credentials, it will connect as an anonymous user rather than erroring with "No handler was ready to authenticate... Check your credentials" and disabling the S3CacheBackend.

Solutions I can think of:

  1. Detect whether credentials are available. But how do you suggest that pip-accel detects whether the user has credentials? The S3 credentials can be set in environment variables, they can be set in a Boto configuration file, they can come from the AWS EC2 metadata service, etc. So far I was able to hide this complexity from pip-accel. The risk is that if I re implement all these checks I may get them wrong. Or a new release of Boto adds yet another type of providing credentials and pip-accel doesn't know about it and so forces anonymous connections.
  2. Maybe pip-accel can catch the No handler was ready to authenticate exception and retry with anon=True when that happens. But then pip-accel will always need an HTTP response to find out that no credentials are available, and only then can it respond by issuing anonymous requests. That feels somewhat sub optimal :-).
  3. We can add another pip-accel setting which explicitly enables anonymous connections to S3.
jzoldak commented 9 years ago

Hey Peter!

Yeah, I agree:

  1. We definitely don't want to replicate the code paths.
  2. This one is sub-optimal but was actually my first inclination because
  3. This one would be sub-optimal for my use case. I want to enable pip-accel on an OpenSource repo for which I allow anybody to use my S3 buckets (read-only). And I want us to be able to use the same contents in the pip-accel.conf. I.e. forks will be using the file from my master, and they should be able to run on travis without changing the config in this file, or setting up a AWS user and configuring boto. I could override with environment variables for pip-accel configuration but I'd rather not have to do that.

Perhaps another option might be to:

  1. Use the boto Config object (or better - use another downstream method, I just looked really quickly) to determine if credentials exist.
>>> from boto import Config
>>> Config().dump()

This shows my credentials if they are configured and shows no credentials if not.

xolox commented 9 years ago

Hi Jesse,

I believe pip-accel version 0.22 resolves your problem.

I went with option two in the end because I realized this past weekend that if I want to allow people to use Boto's "automagic" support for the EC2 metadata service there is no way to avoid the HTTP request so I should just accept it and move on. Anyway, the EC2 metadata service uses the IP address 169.254.169.254 which is a link-local IPv4 address range so I don't think this will cause problems in practice (e.g. hanging on a timeout).

Can you confirm whether pip-accel 0.22 resolves your problem? Thanks for the feature request!

jzoldak commented 9 years ago

Peter - I think it seems to work, but with a few caveats.

We would probably catch these and any other corner case issues or bugs by writing unit tests for the S3CacheBackend. I could totally help do that if you'd like.

xolox commented 9 years ago

boto "Caught exception reading instance data" error messages get output when trying to read the metadata. See this travis build where I turned on verbose mode so you can see the flow (but the errors get printed out regardless of verbose mode).

This is now fixed: I silenced the Boto logger. There is an escape hatch in case someone ever does need to debug the usage of Boto inside pip-accel (for details refer to ec426eaa9ef3369bc0392d560c7f8b0c60212328).

If you do not set s3-readonly = yes then when it fails uploading a file, the S3 cache backend is turned off. See this travis build

I'll see if I can improve on this as well.

We would probably catch these and any other corner case issues or bugs by writing unit tests for the S3CacheBackend. I could totally help do that if you'd like.

There are unit tests for the S3 backend, or rather the test suite runs twice for each supported version of Python: Once with the S3 cache backend disabled and once with the S3 cache backend enabled. This uses FakeS3 so I don't need to create and pay for an actual Amazon S3 bucket.

xolox commented 9 years ago

If you do not set s3-readonly = yes then when it fails uploading a file, the S3 cache backend is turned off.

I believe this is now fixed as well. As far as I can tell FakeS3 doesn't support simulating read only bucket access so I had to figure out how to create a read only IAM policy in Amazon S3 in order to test this manually. For previous manual testing I had used the following IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": "*"
    }
  ]
}

For this testing I switched to the following read only policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:Get*", "s3:List*"],
      "Resource": "*"
    }
  ]
}

In my tests the latest version correctly falls back to read only Amazon S3 access when a PUT fails. Note that this last "feature" is pretty specific to your scenario @jzoldak, so while I didn't mind implementing it, do note that it's rather suboptimal... In my testing Boto would retry the PUT quite a few times before giving up, so by not setting the bucket to read only explicitly some time is wasted by having Boto and pip-accel figure this out for themselves.

Can you confirm whether these changes resolve the caveats you mentioned?

xolox commented 9 years ago

I believe the issue you reported is fixed and the caveats you mentioned are also resolved, so I'm going to go ahead and close this issue now - I'm trying to keep an overview of active/relevant issues on GitHub and the long list isn't helping :-). If you believe there are still issues with the current implementation, feel free to reopen this issue or report a new one. Thanks for suggesting the feature!

jzoldak commented 9 years ago

Yes, I'm all set with this now. Thanks for all your work on this really great project.