ipfs / go-ds-s3

An s3 datastore implementation
MIT License
239 stars 66 forks source link

garbage collection seems to be broken #198

Open obo20 opened 3 years ago

obo20 commented 3 years ago

Garbage collection appears to be broken with this plugin.

I have a working implementation of things where I can add / pin content, list content, get the size of the repo etc.

However running ipfs repo gc returns nothing and shows this in the daemon logs: image

No content is deleted from the s3 bucket when this happens.

Oddly I can run ipfs block rm CID just fine and the content is removed from the s3 bucket

welcome[bot] commented 3 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

Stebalien commented 3 years ago

It looks like the query logic isn't properly parsing keys. The query logic is known to have a bunch of issues, fixed in https://github.com/ipfs/go-ds-s3/pull/39. However, that implements naive query logic (in some edge cases) which make the test-cases take way too long.

olgahaha commented 1 year ago

The ipfs refs local and ipfs repo gc commands don't work, and they don't display any output even after applying the change. It's possible that there's still an issue with the query function.

olgahaha commented 1 year ago

I've successfully merged release v0.8.0 into the #39 and made adjustments to certain code logic to ensure the query functions correctly.

The changes made are as follows:

  1. The S3 prefix has been modified to consist of the root directory path followed by the original prefix, without the initial '/'.
  2. The root directory has been eliminated from the key in the S3 list result.

Here's the update I applied:

diff --git a/s3.go b/s3.go
index 6205220..02f6269 100644
--- a/s3.go
+++ b/s3.go
@@ -215,6 +215,18 @@ func (s *S3Bucket) Query(ctx context.Context, q dsq.Query) (dsq.Results, error)
    // Normalize the path and strip the leading / as S3 stores values
    // without the leading /.
    prefix := ds.NewKey(q.Prefix).String()[1:]
+   prefix = s.s3Path(prefix)
+   if strings.HasPrefix(prefix, "/") {
+       prefix = prefix[1:]
+   }
+   dirPrefix := s.s3Path("") + "/"
+   if strings.HasPrefix(dirPrefix, "/") {
+       dirPrefix = dirPrefix[1:]
+   }
+   dirPrefixLen := len(dirPrefix)
+   if len(prefix) < dirPrefixLen {
+       prefix = dirPrefix
+   }

    sent := 0
    queryLimit := func() int64 {
@@ -226,7 +238,7 @@ func (s *S3Bucket) Query(ctx context.Context, q dsq.Query) (dsq.Results, error)

    resp, err := s.S3.ListObjectsV2WithContext(ctx, &s3.ListObjectsV2Input{
        Bucket:    aws.String(s.Bucket),
-       Prefix:    aws.String(s.s3Path(prefix)),
+       Prefix:    aws.String(prefix),
        Delimiter: aws.String("/"),
        MaxKeys:   aws.Int64(queryLimit()),
    })
@@ -250,7 +262,7 @@ func (s *S3Bucket) Query(ctx context.Context, q dsq.Query) (dsq.Results, error)

            resp, err = s.S3.ListObjectsV2WithContext(ctx, &s3.ListObjectsV2Input{
                Bucket:            aws.String(s.Bucket),
-               Prefix:            aws.String(s.s3Path(prefix)),
+               Prefix:            aws.String(prefix),
                Delimiter:         aws.String("/"),
                MaxKeys:           aws.Int64(queryLimit()),
                ContinuationToken: resp.NextContinuationToken,
@@ -260,8 +272,9 @@ func (s *S3Bucket) Query(ctx context.Context, q dsq.Query) (dsq.Results, error)
            }
        }

+       key := (*resp.Contents[index].Key)[dirPrefixLen:]
        entry := dsq.Entry{
-           Key:  ds.NewKey(*resp.Contents[index].Key).String(),
+           Key:  ds.NewKey(key).String(),
            Size: int(*resp.Contents[index].Size),
        }
        if !q.KeysOnly {

This change has enabled both ipfs refs local and ipfs repo gc commands to function as expected and generate the respective outputs. However, I haven't conducted extensive testing.