operator-framework / catalogd

On-cluster FBC catalog content server
Apache License 2.0
16 stars 32 forks source link

Serve locally stored fbc content via a server #113

Closed everettraven closed 1 year ago

everettraven commented 1 year ago

As part of implementing the RFC for catalogd's content storage and serving mechanism we should import the RukPak storage library (dependent on https://github.com/operator-framework/rukpak/issues/656). Once imported we will need to create a custom implementation of the Storage interface.

The custom Storage implementation should:

Note If https://github.com/operator-framework/rukpak/issues/656 is not accepted by the RukPak maintainers it would be acceptable to copy the Storage interface to a catalogd package. This is not ideal but is rather a last resort if RukPak maintainers deny the request to externalize the storage package.

Acceptance Criteria

everettraven commented 1 year ago

/assign @anik120

anik120 commented 1 year ago

@everettraven what's the motivation for implementing rukpak's storage interface again?

type Storage interface {
    Loader
    Storer
}

type Loader interface {
    Load(ctx context.Context, owner client.Object) (fs.FS, error)
}

type Storer interface {
    Store(ctx context.Context, owner client.Object, bundle fs.FS) error
    Delete(ctx context.Context, owner client.Object) error

    http.Handler
    URLFor(ctx context.Context, owner client.Object) (string, error)
}

Looks like rukpak's storage interface is specialized for storing/loading a filesystem (like an operator bundle, which has multiple files containing multiple Kubernetes CRs), but for catalogd, we want something like

type Storage interface {
    Loader
    Storer
}

type Loader interface {
    Load(ctx context.Context, owner client.Object) (*declaritiveConfig, error)
}

type Storer interface {
    Store(ctx context.Context, owner client.Object, fbc *declarativeConfig) error
    Delete(ctx context.Context, owner client.Object) error

    http.Handler
    URLFor(ctx context.Context, owner client.Object) (string, error)
}

I.e that ☝🏽 is an interface that's for loading and storing fbc files, whereas the first one, is an interface to store and load bundles (two different interfaces with separate purpose).

If we were to store the catalog as a tar.gz file, it'd probably still make sense to implement the rukpak storage. But looks like we decided to store the catalogs as json files, and not compress them like we have to for bundle content. It's doable if there's a very good reason why we must implement rukpak's interface, but thinking about implementing storing, it's evident that it's not exactly a natural fit:

func Store(ctx context.Context, owner client.Object, catalog fs.FS) error {
         fbc := dclcfg.LoadFS(catalog)
         declcfg.WriteJSON(fbc, writer)
}

There's an extra step of loading the fs into declarativeConfig there, that'll just be done just to fit into the rukpak bundle storing/loading interface

dclcfg.LoadFS

everettraven commented 1 year ago

what's the motivation for implementing rukpak's storage interface again?

type Storage interface {
  Loader
  Storer
}

type Loader interface {
  Load(ctx context.Context, owner client.Object) (fs.FS, error)
}

type Storer interface {
  Store(ctx context.Context, owner client.Object, bundle fs.FS) error
  Delete(ctx context.Context, owner client.Object) error

  http.Handler
  URLFor(ctx context.Context, owner client.Object) (string, error)
}

Looks like rukpak's storage interface is specialized for storing/loading a filesystem

This right here is the motivation. A file-based catalog is a filesystem of YAML/JSON files that represent catalog objects. Additionally the unpacking process we currently use returns an fs.FS as part of the result. Both of these things led me to the conclusion that reusing rukpak's Storage interface would be sufficient for what we wanted to do.

type Storage interface {
  Loader
  Storer
}

type Loader interface {
  Load(ctx context.Context, owner client.Object) (*declaritiveConfig, error)
}

type Storer interface {
  Store(ctx context.Context, owner client.Object, fbc *declarativeConfig) error
  Delete(ctx context.Context, owner client.Object) error

  http.Handler
  URLFor(ctx context.Context, owner client.Object) (string, error)
}

I.e that ☝🏽 is an interface that's for loading and storing fbc files, whereas the first one, is an interface to store and load bundles (two different interfaces with separate purpose).

I agree that this would be a better interface if we wanted to use the declcfg.DeclarativeConfig type but doing so would require changes to the unpacking implementation, which ideally deviates minimally from rukpak so we can coordinate a way to reuse rukpak's implementation.

If we were to store the catalog as a tar.gz file, it'd probably still make sense to implement the rukpak storage. But looks like we decided to store the catalogs as json files, and not compress them like we have to for bundle content. It's doable if there's a very good reason why we must implement rukpak's interface, but thinking about implementing storing, it's evident that it's not exactly a natural fit:

func Store(ctx context.Context, owner client.Object, catalog fs.FS) error {
         fbc := dclcfg.LoadFS(catalog)
         declcfg.WriteJSON(fbc, writer)
}

There's an extra step of loading the fs into declarativeConfig there, that'll just be done just to fit into the rukpak bundle storing/loading interface

dclcfg.LoadFS

I don't think we should use the declcfg.LoadFS function here. We could instead use the declcfg.WalkMetasFS function which was created specifically to walk an fs.FS and parse out the declcfg.Meta type. We already use this in the syncCatalogMetadata function that currently exists in the catalog controller: https://github.com/operator-framework/catalogd/blob/64a9fe58c6fd08dc8328854a6644bf4eeac1293b/pkg/controllers/core/catalog_controller.go#L392-L442

Using declcfg.WalkMetasFS allows us to lower the memory usage by not duplicating the entire fs.FS as a declcfg.DeclarativeConfig object and instead just append the marshalled JSON from the declcfg.Meta object to the all.json file that corresponds to the Catalog resource passed in as the "owner".

anik120 commented 1 year ago

Looks like rukpak's storage interface is specialized for storing/loading a filesystem

This right here is the motivation. A file-based catalog is a filesystem of YAML/JSON files that represent catalog objects.

What I meant was, looks like rukpak's storage interface is specialized for compressing/decompressing the content of a filesystem. Bundles can have arbitrary number and type of content, so it makes total sense to compress and store them. And looks like we've made the same conclusion in the RFC, that it makes more sense to store the FBCs as JSON files.

Additionally the unpacking process we currently use returns an fs.FS as part of the result.

IIRC, there's been discussions about working on that unpacking interface, we just haven't had the chance to get to it. So if I'm not wrong, that interface is already up for debate in rukpak(due to the fidelity of pod logs). We may want to actually write an unpacker that returns a declCfg.DeclartiveConfig anyway.

We could instead use the declcfg.WalkMetasFS function which was created specifically to walk an fs.FS and parse out the declcfg.Meta type.

That's a good point. Although, it's just another way that store has to load the content into memory, before it can actually append to any file. Passing around an fs.Fs feels like tying the knots with rukpak, which again is specializing in a different type of content.

current path Unpacker:fs.FS -> fs.FS:controller -> fs.FS:Storer:declCfg.DeclarativeConfig -> stored in file

instead if we do Unpacker:declCfg.DeclarativeConfig -> declCfg.DeclarativeConfig:controller -> declCfg.DeclarativeConfig:Storer -> stored in file

there's only one chunk being loaded into memory (the declCfg.DeclarativeConfig), instead of two (fs.FS and declCfg.DeclarativeConfig).

At the very least, it feels like we should start with the simplest thing, which is a Storer which gets declCfg.DeclarativeConfig. That way, we have the option of deciding "nope, we'll just use the rukpak interface, so we'll change this implementation to be a rukpak storage", which is much easier than untying things.

which ideally deviates minimally from rukpak so we can coordinate a way to reuse rukpak's implementation.

this is saying there are two components (catalogD and rukpak) which does the same thing, but are located in different places. That's hard to reconcile when bundle content (unstructured, from the perspective of rukpak) is very different from catalog content(structured fbc).

anik120 commented 1 year ago

Further discussion in this slack thread: https://kubernetes.slack.com/archives/C0181L6JYQ2/p1691082152778479

Check back later for TLDR..next steps :loading:

everettraven commented 1 year ago

I think this was incorrectly closed. The original intent with this issue was to implement the filesystem storage and the http fileserver for serving the stored content. With the way things are split up this issue will be appropriately closed once #148 is merged