liormizr / s3path

s3path is a pathlib extension for AWS S3 Service
Apache License 2.0
208 stars 39 forks source link

Help with moving bandersnatch to python3.12 #172

Closed cooperlees closed 1 week ago

cooperlees commented 5 months ago

Hi,

I am the maintainer of bandersnatch, but I have very little s3 skills in general - so would love help moving to >= 3.12 + S3Path and cleaning up our usage of s3path which I feel we hacked around issues rather than fixing bugs here when it was contributed.

Code

https://github.com/pypa/bandersnatch/issues/1710

Feel free to close if you feel this is an abuse of using issues. I'd also love a list of how you'd approach the move if you don't have time to do it, as I am confused in the order I should potentially take and if there is any hope maintaining backwards compatibility or if I should give up with that and just move to >= 3.12 only for S3 support?

liormizr commented 5 months ago

Hi @cooperlees

As long that we don't have a lot of open issues to manage I'm OK with keeping this issue as a platform for helping you guys plan the refactor.

The beauty in s3path is that you don't need to know S3, you just need to think about it like a normal file system and use the regular pathlib API. In this way you can also switch between S3Path and PosixPath when you want to do a local test.

Now about your code, in general I don't see any reason to keep your our S3Path object. The glob method is optimize now and you are not really using the mkdir method that you write.

Except for that all looks greate You are not using s3path for copy, this makes sense (it's more optimize then the API that we have from pathlib). The only think that I change except for the local s3path inherent is not to use s3path privets For example you have this code:

        resource, _ = path._accessor.configuration_map.get_configuration(path)

In python 3.12 it will brake I recommend changing to this:

        import s3path
        resource, _ = s3path.configuration_map.get_configuration(path)
cooperlees commented 4 months ago

not to use s3path privets

I tried to do this like you've said (Please correct on PR if I misuderstood) in regards to configuration_map but it seems to not hold authentication settings when using it this way. PR: https://github.com/pypa/bandersnatch/pull/1724/files Example CI Fails: https://github.com/pypa/bandersnatch/actions/runs/9055904268/job/24877694652?pr=1724

botocore.exceptions.NoCredentialsError: Unable to locate credentials

Do we need a public API to do this with auth settings correct? Or does bandersnatch's CI need to set auth settings more than ENV variables and in the test initialization: https://github.com/pypa/bandersnatch/blob/main/src/bandersnatch/tests/conftest.py#L195

liormizr commented 4 months ago

Hi @cooperlees I deployed a new version so you will have s3path.configuration_map in older Python versions (s3path version 0.5.7)

About the issue that you have with boto credentials, can you give me a unit test (without your ecosystem) to understand that is the issue here? I'm using s3 in integration tests and moto in unit tests and I don't see this issue

cooperlees commented 4 months ago

Thanks for the response. I guess moto mocks more than we do here. I don't know how to simplify the error situation, but I feel it's pretty clear it's something that works when we use the private instances that have been initialized with the auth settings we need vs. your public methods that do not. We just setup auth for minio and hit it via s3 APIs, the integration testing isn't that complex I feel for someone who understands s3 and s3path (which is not me).

I'll go try update to 0.5.7 and see if that helps, but I don't think it will.

All I want to do is move bandersnatch to 3.12, and s3path has made that difficult :(

cooperlees commented 4 months ago

Sadly same errors with 0.5.7 - I'm not sure what I am meant to do differently with it

liormizr commented 4 months ago

@cooperlees Maybe I'm missing something but it looks like something is weird with your setup See the code here The "public" PI that I added is the privet one, it's not really different Maybe you changed your boto3 version? looks like something in the setup...

cooperlees commented 4 months ago

Ok, good news. I sat down a read your PRs for 0.5.7 and saw what I misunderstood and we got further! I was able to move to the new public API everywhere (missed a few but they are in the PR linked below).

I now even have CI "passing" in a 3.12 environment, so I thank you for all your help.

I think we only have 1 remaining big item (apart from other unitest private usages I'll look into separately after working this out) tho. I'm sure you don't want us to be using from s3path.accessor import _generate_prefix, so what is your recommendation here for this code?