jszwec / s3fs

S3 FileSystem (fs.FS) implementation
MIT License
177 stars 20 forks source link

Switch to v2 of the aws sdk #2

Closed guenhter closed 1 year ago

guenhter commented 2 years ago

Amazon has a version 2 of the api which is already the official predecessor of v1.

The main advantages of the new API are performance improvements, easier error handling, easier credentials handling, ...

https://aws.amazon.com/blogs/developer/client-updates-in-the-preview-version-of-the-aws-sdk-for-go-v2/

guenhter commented 2 years ago

This is of course a breaking change.

jszwec commented 2 years ago

According to pkg.go.dev, as of today v1 is imported by 4672 pkgs while v2 by only 462. S3FS in particular won't benefit from the updated lib that much so I don't think it makes sense to break current users code just to update the lib version. It's too early, we can revisit this once the adoption of v2 is higher, or v1 of s3 becomes deprecated (currently it's still maintained)

guenhter commented 2 years ago

Fair enough. I'll keep this MR open if you don't mind ... just in case.

thomasf commented 2 years ago

This can either just be published as a new major version (not breaking any current users) or be published as a separate package, maybe by @guenhter just like aws-go-sdk-v2 is.

jszwec commented 2 years ago

Yes it could be published as a v2, but then v1 is far from being deprecated due to the fact how popular aws sdk v1 is. This also means that there will be two versions of the library to maintain. Considering the current popularity of aws sdk v1, is it really worth it?

guenhter commented 2 years ago

@jszwec The v2 of the api has some big benefits. It has a nicer configuration handling (not in regard to the go api but v2 allows e.g. to define the region in ~/.aws/config). So having v2 is usually a good way. I still see your point that v1 is still heavily used. Do you want me just to start off a own library (like @thomasf suggested) named s3fs-v2?

thomasf commented 2 years ago

@guenhter Since this is a breaking change I would suggest adding a context argument to s3fs.New while you are at it so we get better canceling.

guenhter commented 2 years ago

@thomasf Good idea. Done. Could you have another look if you like it that way.

jszwec commented 2 years ago

@guenhter

The v2 of the api has some big benefits. It has a nicer configuration handling (not in regard to the go api but v2 allows e.g. to define the region in ~/.aws/config).

You can already do it in v1: https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html

So having v2 is usually a good way.

I am not saying that we shouldn't use it, I am saying that we don't need to rush it, since it's not popular yet

@thomasf @guenhter What is the point of providing context to New? FS is supposed to be a long lived var, so it's not like you are going to be recreating FS for every request to have cancelations, and fs.FS doesn't support context per Read. Could you please elaborate?

thomasf commented 2 years ago

@thomasf @guenhter What is the point of providing context to New? FS is supposed to be a long lived var, so it's not like you are going to be recreating FS for every request to have cancellations, and fs.FS doesn't support context per Read. Could you please elaborate?

It doesn't have to be a long lived variable, a struct like this is cheap and can be instantiated every time a request is handled if you want all AWS client communication to cancel if an HTTP request is closed by a client in a request handler (or whatever). You are correct that it is a kind of a blunt tool because it doesn't work on a per file system request basis but it is still useful.

thomasf commented 2 years ago

Adding the context at FS creation was suggested as a good solution during the io/fs proposal https://github.com/golang/go/issues/41190#issuecomment-690819876

thomasf commented 2 years ago

I think you might to check the context for errors before new calls. Not sure if this is the best way to do it.

diff --git a/fs.go b/fs.go
index fe74df5..7ca2179 100644
--- a/fs.go
+++ b/fs.go
@@ -119,6 +119,13 @@ func (f *S3FS) Open(name string) (fs.File, error) {

 // Stat implements fs.StatFS.
 func (f *S3FS) Stat(name string) (fs.FileInfo, error) {
+       if err := f.context.Err(); err != nil {
+               return nil, &fs.PathError{
+                       Op:   "stat",
+                       Path: name,
+                       Err:  err,
+               }
+       }
        fi, err := stat(f.context, f.s3Api, f.bucket, name)
        if err != nil {
                return nil, &fs.PathError{
diff --git a/fs_integration_test.go b/fs_integration_test.go
index b0d0f49..ce43d6e 100644
--- a/fs_integration_test.go
+++ b/fs_integration_test.go
@@ -567,6 +567,32 @@ func TestFS(t *testing.T) {
                        testFn(t, f.s3fs)
                })
        }
+
+       t.Run("test cancel context", func(t *testing.T) {
+               ctx, cancel := context.WithCancel(context.Background())
+               fsys := s3fs.New(ctx, s3cl, *bucket)
+               {
+                       fi, err := fsys.Stat(".")
+                       if err != nil {
+                               t.Fatal(err)
+                       }
+                       if fi == nil {
+                               t.Fatal("file info is nil")
+                       }
+               }
+               cancel()
+               {
+                       fi, err := fsys.Stat(".")
+                       if err == nil {
+                               t.Fatal("expected error")
+                       }
+                       if fi != nil {
+                               t.Fatal(fi)
+                       }
+               }
+
+       })
+
 }

 func newClient(t *testing.T) s3fs.S3Api {
jszwec commented 2 years ago

fair enough

context errors dont need to be checked like you propose. S3 client should be responsible for doing that.

thomasf commented 2 years ago

Yeah I am not sure why the test doesn't fail unless I add it explicitly, maybe there is a problem somewhere else. It is probably a bug in the aws sdk or it is intended to work like that for some weird reason.

thomasf commented 2 years ago

DOH!. I arbitrarily tested a stat call for . which is the only hard coded return of a dir entry struct in fs.go. So no issue at all, works as expected when the code actually uses AWS API calls.

diff --git a/fs_integration_test.go b/fs_integration_test.go
index b0d0f49..d7356c0 100644
--- a/fs_integration_test.go
+++ b/fs_integration_test.go
@@ -567,6 +567,34 @@ func TestFS(t *testing.T) {
                        testFn(t, f.s3fs)
                })
        }
+
+       t.Run("test cancel context", func(t *testing.T) {
+               ctx, cancel := context.WithCancel(context.Background())
+               fsys := s3fs.New(ctx, s3cl, *bucket)
+               {
+                       fi, err := fsys.ReadDir(".")
+                       if err != nil {
+                               t.Fatal(err)
+                       }
+                       if fi == nil {
+                               t.Fatal("file info is nil")
+                       }
+               }
+               cancel()
+               {
+                       fi, err := fsys.ReadDir(".")
+                       if err == nil {
+                               t.Fatal("expected error")
+                       }
+                       if fi != nil {
+                               t.Fatal(fi)
+                       }
+               }
+
+       })
+
 }