googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.78k stars 1.3k forks source link

storage: RawPath weird behaviour when retrieving bucket's blob #10911

Closed GuyAfik closed 1 month ago

GuyAfik commented 1 month ago

Client

Storage

Environment

Linux

 % go version
go version go1.22.0 darwin/arm64

Code and Dependencies

r, err = s.bucketGCP.NewReader(ctx, storagePath, nil)

Problem

We have a large server code base that reads storage content from GCP. The structure of it is the following:

/name1-name2-name3/./folderA/folderB/FolderC/FolderD/some-file.zip

To read the object we use the storage module:

ctx := context.Background()
r, err = s.bucketGCP.NewReader(ctx, storagePath, nil)

At the moment we use old version (1.30.1) We have tried to upgrade to google storage 1.36.0 and since then our storage objects cannot be found anymore. (in 1.30.1 it works just fine)

The error i am getting is:

blob (key "folderA/folderB/FolderC/FolderD/some-file.zip") (code=NotFound): storage: object doesn't exist

From my investigation its related to the RawPath added when doing http request (in version 1.30.1 - the RawPath isn't used): https://github.com/googleapis/google-cloud-go/blob/main/storage/http_client.go#L839

If I remove it manually then our storage folders are found just fine. It might be related to URL escaping.

Assistance would be appreciated to whether its related to URL escaping which is done properly or anything else

Expected behavior

Request should be successful and storage zip should be found

Additional context

e.g. Started when upgrading to v1.36.0.

tritone commented 1 month ago

Can you provide a fuller repro? In particular I'm curious if your application code is manually escaping the path before passing it in somehow, and what options you are passing when creating the client.

When you view the object in the GCP console, what does the object name look like there?

It also might be worth trying https://pkg.go.dev/cloud.google.com/go/storage#WithJSONReads

GuyAfik commented 1 month ago

Hi,

Thanks for the quick response, I will check it out and come back to you with the details.

At the moment I would be happy to try and use WithJSONReads

Can you provide a code sample?

here's a bit more extended code: (we do not pass any special configs)

type ServiceMarketplace struct {
    bucketGCP               *blob.Bucket
}
func (s *ServiceMarketplace) downloadSomething {
    r, err = s.bucketGCP.NewReader(ctx, file.file, nil)
}

It seems like NewReader signature is:

func (b *Bucket) NewReader(ctx context.Context, key string, opts *ReaderOptions) (*Reader, error)

when ReaderOptions is a struct:

// ReaderOptions sets options for NewReader and NewRangeReader.
type ReaderOptions struct {
    // BeforeRead is a callback that will be called exactly once, before
    // any data is read (unless NewReader returns an error before then, in which
    // case it may not be called at all).
    //
    // asFunc converts its argument to driver-specific types.
    // See https://gocloud.dev/concepts/as/ for background information.
    BeforeRead func(asFunc func(interface{}) bool) error
}

How can I configure to use JSON instead of XML?

GuyAfik commented 1 month ago

@tritone Can you please help with the query above? Thanks

tritone commented 1 month ago

Hey, I'm not sure what code you are referencing above, maybe it's something internal to your codebase? Here is the signature for NewReader in this library: https://pkg.go.dev/cloud.google.com/go/storage@main#ObjectHandle.NewReader

To use WithJSONReads you supply it as an option to NewClient: https://pkg.go.dev/cloud.google.com/go/storage@main#NewClient

client, err := storage.NewClient(context.Background(), storage.WithJSONReads()) // supply other opts as needed

GuyAfik commented 1 month ago

@tritone The Bucket object has the NewReader method I am referring, to retrieve the bucket we use

blob.OpenBucket

And not storage.NewClient

I do not have the option to set the

storage.WithJSONReads()

unfortunately

We are opening the bucket like that:

func openGCPBucket(ctx context.Context, bucketName string) (*blob.Bucket, error) {
    bucket, err := blob.OpenBucket(ctx, "gs://"+bucketName)
    if err != nil {
        return nil, fmt.Errorf("failed to open GCP bucket [%s]: %w", bucketName, err)
    }

    return bucket, nil
}
tritone commented 1 month ago

Okay, it seems like you are using a library/codebase that is not this one (and I'm not sure exactly which, potentially something built on top of this package?). We don't have those methods in this library.