operator-framework / operator-registry

Operator Registry runs in a Kubernetes or OpenShift cluster to provide operator catalog data to Operator Lifecycle Manager.
Apache License 2.0
212 stars 247 forks source link

"opm index prune" leaves (what appear to be) unrelated operators #542

Closed magreenberg closed 3 years ago

magreenberg commented 3 years ago

Bug Report

The "opm index prune" leaves (what appear to be) unrelated operators.

What did you do?

$ opm index prune \
    -f registry.redhat.io/redhat/redhat-operator-index:v4.6 \
    -p jaeger-product \
    --generate

followed by:

$ sqlite3  database/index.db "select distinct(operatorbundle_name) from related_image order by 1;"

What did you expect to see?

A list of the "jaeger-product" operators included in the image index.

What did you see instead?

clusterkubedescheduleroperator.4.6.0-202012050130.p0
clusterresourceoverride-operator.4.6.0-202012050130.p0
fuse-console.v7.8.0
jaeger-operator.v1.13.2-1
jaeger-operator.v1.17.7
jaeger-operator.v1.20.1
jaeger-operator.v2.0.0-tp.1
local-storage-operator.4.6.0-202012050130.p0
metering-operator.4.6.0-202012050130.p0
nfd.4.6.0-202012050130.p0
ptp-operator.4.6.0-202012050130.p0
sriov-network-operator.4.6.0-202012050130.p0
verticalpodautoscaler.4.6.0-202012050130.p0

As can be seen, 4 operators match the requested "jaeger-operator" pattern as specified in the "prune" command. However, 9 operators appear to be unrelated to the "jaeger-operator".

Why are operators that do no match the "jaeger-operator" pattern specified, included in the database?

Environment

$ opm version
Version: version.Version{OpmVersion:"v1.15.3", GitCommit:"9e92474", BuildDate:"2020-12-03T18:34:29Z", GoOs:"linux", GoArch:"amd64"}
$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.3 (Ootpa)
magreenberg commented 3 years ago

Further investigate leads me to believe that the issue is that the relational database includes records that do not have expected relations.

It appears that the operatorbundle_name is a form of primary key that links related data in columns in the tables. When opm index prune starts, it looks up the operatorbundle_name based on the package_name in the channel_entry table:

SELECT DISTINCT channel_entry.operatorbundle_name
FROM channel_entry
WHERE channel_entry.package_name=?

The issue that we are seeing is that when an operatorbundle_name does not exist in the channel_entry table for a given package, the operator will not be pruned.

This can be seen as follows.

Obtain the full redhat-operator-index database index.db file as follows:

podman run --name redhat-operator-index -ti --rm -d registry.redhat.io/redhat/redhat-operator-index:v4.6
podman cp redhat-operator-index:/database/index.db index.db
podman rm -f redhat-operator-index

Search the database for the 'fuse-console' package:

sqlite3 index.db "select operatorbundle_name, package_name from channel_entry where package_name = 'fuse-console';"

The output is:

fuse-console.v7.7.1|fuse-console
fuse-console.v7.8.1|fuse-console

Note that fuse-console.v7.8.0, listed as unexpected above, is not included in this list of operators that opm would prune.

In order to obtain the full list, search the database for all operatorbundle_names that do not exist in the channel_entry table run:

sqlite3  index.db "select distinct(operatorbundle_name) from related_image where operatorbundle_name not in (select operatorbundle_name from channel_entry) order by 1;"

The outputput matches the list above of operators that should not have been included in the prune command above:

clusterkubedescheduleroperator.4.6.0-202012050130.p0
clusterresourceoverride-operator.4.6.0-202012050130.p0
fuse-console.v7.8.0
local-storage-operator.4.6.0-202012050130.p0
metering-operator.4.6.0-202012050130.p0
nfd.4.6.0-202012050130.p0
ptp-operator.4.6.0-202012050130.p0
sriov-network-operator.4.6.0-202012050130.p0
verticalpodautoscaler.4.6.0-202012050130.p0

The tables operatorbundle, api_provider and properties also include records that should have been pruned as can be seen from the following command:

sqlite3  index.db "select distinct(name) from operatorbundle where name not in (select operatorbundle_name from channel_entry) order by 1;"

sqlite3  index.db "select distinct(operatorbundle_name) from api_provider where operatorbundle_name not in (select operatorbundle_name from channel_entry) order by 1;"

sqlite3  index.db "select distinct(operatorbundle_name) from properties where operatorbundle_name not in (select operatorbundle_name from channel_entry) order by 1;"

I'm not sure whether the solution should be in the opm index prune command, or whether the tables should be verified as correct before being released.

dmc5179 commented 3 years ago

might explain why mirroring the RHACM operator comes back with 400 images. How could I use your operator-registry during the pruning process to fix this?

yuvalk commented 3 years ago

these are probably residuals of a previous run or the use of oc adm catalog mirror

a possible solution is to also use opm index prune-stranded on the newly generated image (tag) after the prune command

ecordell commented 3 years ago

556 will automatically prune-stranded during a normal prune, which should address this issue.