laminlabs / lamindb-setup

Setup & configure LaminDB.
Apache License 2.0
4 stars 1 forks source link

Lower and upper bound aws dependencies rigorously #729

Open falexwolf opened 4 months ago

falexwolf commented 4 months ago

I just ran this with the following output:

ln.UPath("s3://lamindata").fs.call_s3("head_bucket", Bucket="lamindata")
image

When @sunnyosun runs it she gets the below, hence, she doesn't get any BucketRegion return value:

image

falexwolf commented 4 months ago

This might be related:

So while the API behaves as documented, unfortunately the API model from which the SDK is generated from is lacking the desired output shape.

sunnyosun commented 4 months ago

I upgraded botocore from 1.31.17 to 1.34.84 and now it works:

image

falexwolf commented 4 months ago

Given the issues that Tim ran into and given this (https://github.com/laminlabs/lamindb/issues/1590), we should start to lower-bound and upper-bound rigorously everything that relates to aws.

falexwolf commented 4 months ago

The issue is that the version that works is pretty recent: 1.34.84

It'd be a rather strict requirement to depend on it.

When I look through botocore, even today, it seems like they test this response element

https://github.com/boto/botocore/blob/7e2794d045d15befc51b384440399d3c5fb404ab/tests/integration/test_s3.py#L1409

falexwolf commented 4 months ago

Ok, so, I think it's safer to use the old way of calling it.

falexwolf commented 4 months ago

Hence, I refactored and added these tests:

Koncopd commented 4 months ago

Sorry, i overlooked this, but it is not about pinning or lower bounds, using x-amz-bucket-region is the correct way (that is how s3fs gets regions and so on) and BucketRegion is just not. I haven't checked the PR properly at that time. But it should be tested, i agree, thank you for fixing.

falexwolf commented 4 months ago

and BucketRegion is just not

Judging from the issue on aws-sdk, I think BucketRegion is the new correct way of doing it.

The tests don't catch it because they run on the latest versions of botocore.

Koncopd commented 4 months ago

Also about pinning, what is your version of botocore that gives BucketRegion, @falexwolf?

Koncopd commented 4 months ago

The problem is that we actually pin botocore through aiobotocore - botocore>=1.34.41,<1.34.70 in lamindb, so we should not use or test with the latest version. And 1.34.84 is actually incompatible with our pins. And for our pin x-amz-bucket-region is the correct way.

falexwolf commented 4 months ago

Also about pinning, what is your version of botocore that gives BucketRegion, @falexwolf?

The 1.34.84 - I asked Sunny to try that.

The problem is that we actually pin botocore through aiobotocore - botocore>=1.34.41,<1.34.70 in lamindb,

Great! Then we're all good on this front!

falexwolf commented 4 months ago

Hence, the only thing left is to lower and upper bound s3fs rather than pinning it.

https://github.com/laminlabs/lamindb/blob/13a5662debdbdc20e88ca20738b65995e9bd8932/pyproject.toml#L46

And: we should be able to get rid of one of these two lines, shouldn't we?

https://github.com/laminlabs/lamindb/blob/13a5662debdbdc20e88ca20738b65995e9bd8932/pyproject.toml#L46-L47

Koncopd commented 4 months ago

Yes, i will work on this soon.

Also note that botocore 1.34.84 is incompatible with aiobotocore, so you might have different problems due to that, better to downgrade.

falexwolf commented 4 months ago

Now running into this issue here - https://github.com/laminlabs/lamindb-setup/actions/runs/8835925517/job/24261282965?pr=732

image

Because we don't define the aws extra on the lamindb-setup level, we might have inconsistent installation across lamindb-setup CI and lamindb CI.

To keep things in sync, we should likely define the aws extra dependencies on the lamindb-setup level and then port them to lamindb.

falexwolf commented 4 months ago

Making the first step in this direction here:

Koncopd commented 4 months ago

This is harder to manage if we have the pins in both lamindb-setup and lamindb.

Koncopd commented 4 months ago

And we also have gcp extras, then we have to move it here.

Koncopd commented 4 months ago

This is really harder to manage and confusing if we scatter this extras across different packages. Can we keep it in lamindb?

falexwolf commented 4 months ago

Can we keep it in lamindb?

I don't see how. If I don't have them here, the tests are failing. And it makes sense because a lot of storage functionality is already here.

if we have the pins in both lamindb-setup and lamindb.

We only have pins here. lamindb delegates to lamindb-setup.

falexwolf commented 4 months ago

See

Koncopd commented 4 months ago

Ok, could you move gcp also here then?

Koncopd commented 4 months ago

I still think it is a bit confusing to scatter extras, but i agree that it is more convenient for testing.

falexwolf commented 4 months ago

Ok, could you move gcp also here then?

Will do!

I still think it is a bit confusing to scatter extras, but i agree that it is more convenient for testing.

Right now I think it's a hard requirement for testing. If we find a different solution or eventually can have a mono-repo, then things will be different.

Koncopd commented 4 months ago

Ok, then i will try upgrading fsspec and replacing pins with lower and upper bounds on the weekend.

falexwolf commented 4 months ago

Sounds great!