Open iainduncani opened 1 year ago
One additional thought that one of my team members pointed out to me this morning - I think even with updating the layer scanner the links in the dist_scanartifact
will persist so a change to this may require also deleting the incorrect dist_scanartifact links (maybe as part of a SQL migration? The incorrect dist
values would be easy to identify due to the different version number schemes so could just delete the dist
and let it cascade to other tables)
From what I can tell, I believe the Ubuntu distribution scanner may think it's Debian but not the other way around because of the following:
The Debian distribution scanner verifies the etc/os-release
file is for Debian via this check:
if kv[`ID`] != `debian` {
return nil, nil
}
The Ubuntu distribution scanner does no such check. I verified this happens by adding a test in ubuntu/distributionscanner_test.go
and adding the etc/os-release
for Debian 11 inside ubuntu/testdata
. I saw the following test pass:
{
Name: "11",
Want: claircore.Distribution{
Name: "Ubuntu",
DID: "ubuntu",
VersionID: "11",
PrettyName: "Ubuntu 11",
VersionCodeName: "bullseye",
Version: "11 (Bullseye)",
},
},
We'll see an equivalent test in debian/distributionscanner_test.go
will fail (as-is, it will panic due to how the test is setup)
I think the solution for this would to be to simply verify ID
or DISTRIB_ID
field actually represents Ubuntu.
As for potential migration, I'll defer that to someone else, but I believe as long as we update the Ubuntu distribution scanner and bump the version specified in the file, then Clair users should be able to recognize the indexer has been updated, so the index report must be regenerated. Maybe just by redoing the index report there is no need for a migration? @crozzy ?
I started a PR to attempt to resolve this: https://github.com/quay/claircore/pull/970
Yeah, I think iterating the version of the ubuntu
distribution scanner should work, the thing I'm not 100% clear on is whether we'd also need to iterate the debian
distribution scanner too, but I guess it'll be clearer during PR testing
@RTann thanks so much for getting the fix in for this so quickly. We've just been testing this in our dev environment. One thing that we have found is that because we have layers in the database that are linked to these incorrect "Ubuntu" entries via a row in the dist_scanartifact
when we rescan an image that has one of these layers because the old link is not removed it can still get the wrong result.
What would you advise is the right solution here? Our thoughts are that we will manually tidy up all of these wrong dist entries and dist_scanartifact
links prior to upgrading and then when the images are rescanned they will not get added back in due to this fix and therefore there will only be one dist to find in the coalesce phase.
Do you think that our outside-of-clair approach is the correct one or would you rather something went into Clair in addition that picked the right answer when two were found. I'd wondered about making the linux.Coalescer
be cleverer in picking the right dist and filter out Ubuntu versions that don't have a .
in them rather than just picking [0]
as it does at the moment but that feels like a bit of a hack for a problem that should not come back if we tidy the DB. Would love your opinion though.
The Problem
We have some Debian test images such and they occasionally report their
IndexReport.Distribution
as Ubuntu but with some information from the Debian release such as:We run Clair in a few different environments and sometimes this image will correctly scan with a distribution did of
debian
. Furthermore, looking into thedist
table we see there are lots of Debian releases but labelled as Ubuntu.Version
9
,10
,11
and12
are all Debian releases.The Cause
Looking at the code I think this is caused by the fact that both the Debian and Ubuntu
DistributionScanner
just look for aetc/os-release
file. If that file has anVERSION_ID
andVERSION_CODENAME
field then they will match and store their dist in the DB linked to this layer. Both hard code thedid
to either ubuntu or debian when they do so. When the scanner has finished the distribution they find get written to the database by the layerscanner explaining why we have lots ofdist
table entries for Ubuntu on Debian images. (note we don't have dist entries the other way for Ubuntu images marked as Debian because the Debian scanner also tries to conver the version to an int which fails for Ubuntu images)Looking at the
etc/os-release
file for our images (we've seen this with bothdebian:stable-slim
andgcr.io/distroless/base
based images) we can see that it will meet the criteria for both scanners:As predicted when this gets to the
dist_scanartifact
table we have two links to thedist
table, one for Ubuntu and one for Debian:After the two distributions have been linked to the layer and the layer scanning phase has completed the
Controller
goes into thecoalesce
phase. Here the two distributions will be loaded using theDistributionsByLayer
query. The coalescer fordpkg
will then be asked to coalesce this data. Thedpkg
ecosystem uses the Linux Coalescer to do this. The Linux Coalescer only allows a single dist per layer:So at this point the two distributions we see in the database will be collapsed into one that we see in the ScanReport. I think that the one that is picked will be semi-random based on the order that is returned from the
DistributionsByLayer
query. One thing that confuses me though is that we don't see this problem closer to 50% of the time because that is an unordered query but we do seem to get the right distribution for both Ubuntu and Debian images the majority of the time even though they have two distributions linked in thedist_scanartifact
table, I suspect it is purely because the dpkg Ecosystem returns the Debian scanner before the Ubuntu one and that PostgreSQL isn't fully random on unordered sets. Whatever the reason for it not being 50% it clearly isn't fully reliable though as we have seen this error several times.The solution?
I noticed that the RHEL
DistributionScanner
also checked for the presence of files but in addition did a regex match on them forRed Hat Enterprise Linux (?:Server)?\s*(?:release)?\s*(\d+)(?:\.\d)?
. Could this be added to the Ubuntu and Debian scanners to get the right answer?