Closed westlifezs closed 8 years ago
I use postman from a remote machine instead of using curl at local then I can get rid of the 307 error. However, I am still seeing an error "could not extract the archive". The url I provided is to access binary data stream of the target layer. Is this the right URL? I could not find a download url to retrieve a tar ball directly. This is because layers can only be accessed through binary data stream under docker registry v1. Shall I download the layers and store them into a tar format to somewhere and then provide the URL to the tar file?
Hi,
As mentionned in #127, your 307 status code happens because you're calling /v1/layers/ instead of /v1/layers. Clair simply corrects you automatically.
This is because layers can only be accessed through binary data stream under docker registry v1. Shall I download the layers and store them into a tar format to somewhere and then provide the URL to the tar file?
You're trying to use Docker's registry API to integrate with Clair. Does that mean that you can't directly hook into your container registry?
I believe that GET /v1/images/(image_id)/layer
might work for public images, even though I never experimented with it. However, it's never going to work with private images.
You might be interested in looking at Dockyard integration (v0.1 though) and Hyperclair.
thank you for your response. I am sure I can access binary data stream of the target layer through GET /v1/images/(image_id)/layer on clair's host as I tested it. However, I still keep hitting the error mentioned in the previous post. I am just wondering, does clair only accept tar ball or compressed file url? or it also accept url to obtain the binary data stream?
specifically, if I run GET /v1/images/(image_id)/layer in clair server, then I will see binary stream filling up my screen. will this be acceptable from clair's perspective? or I need to convert/redirect the binary stream into a tar file for clair to access?
thanks
also, what do you mean by saying public or private images? when using GET /v1/images/(image_id)/json, I cannot find anywhere indicating an image is public or private. Do you mean a public image is an image stores within docker hub while a private image stores within a private registry? what if the clair can access the private registry's image through the provided url as clair is also within our corp network? will this still work?
@westlifezs
I am just wondering, does clair only accept tar ball or compressed file url?
This would seem to be the case. The code for detecting whether relevant data is present does an http.Get()
with the specified path for the layer that is eventually passed into SelectivelyExtractArchive()
which is expecting a targz
.
will this be acceptable from clair's perspective? or I need to convert/redirect the binary stream into a tar file for clair to access?
The way we're planning our implementation is to have some sort of network storage that the Clair server can see and pull tar
'd versions of the layers for our internal images from.
also, what do you mean by saying public or private images?
Correct me if I'm wrong, @Quentin-M, but I think he was trying to point out that clair doesn't support any extra auth checks besides downloading something via http/https. So, if you have some registry that is not able to be seen by the Clair server, it will will not be able to grab the layers. Which brought on your next question...
what if the Clair can access the private registry's image through the provided url as clair is also within our corp network?
This would seem to be okay. It's exactly what we are planning. Both our Clair server and location where the images/layers are stored will not be able to be seen by the outside world. They will be contained within our Amazon infrastructure and restricted to internal access only.
@mattlorimor thanks for your response. based on what you said, currently if I need to integrate our docker registry with clair, then I will have to use an intermediate storage converting the binary data stream into targz and let clair to know its address. right? Is there an easier way can sort this out? if this is the case, then it will not be easier than by downloading the image/layers directly to clair server and use "analyze-local-images" tool to analyze the target images, right? I guess docker registry v2 might support tar format for layer download but like I said, our environment only support registry v1 as of now and I am trying to find out a best way to integrate it with clair. thanks
mattlorimor's answer given above is correct. Thanks for participating. Clair expects to be able to download over HTTP or HTTPS (or access it locally if a path is provided instead of an URL) layer filesystems as a tarballs, optionally compressed with gzip / xz / bz2.
A private container image is an image that requires following an authentication flow against the registry in order to be visible and pulled. Clair doesn't use any authentication logic when it comes to download an image. Thus, if your image is private, in other words, if your registry or file server requires credentials / a valid session to authorize the request that downloads a layer, it's not going to work. It's simple as that.
Now, say that your layers are stored on S3. You could simply make your registry sign direct download links for the layers that your registry wants to see analyzed by Clair. Because the links are direct, authorized (for a determined duration) using query parameters contained in the links themselves, and because you talk to the Clair API over HTTPS (so that no eavesdrop could see the links), it works and it's perfectly safe.
@westlifezs:
Why don't you just expose an extra endpoint on your registry / API that would simply stream tar'd layers? Might as well submit a PR to let Clair read binary stream, if this is even possible ;-)
Our docker registry administrator indicated that docker image layers are not stored in tar format in the system. do you mean it is gonna be easier if we compress layer binary data into tars and expose them through an extra API endpoint but on the same registry host?
thanks. will try to explore this function further.
@Quentin-M
Thanks for participating.
Absolutely. Thanks for maintaining this. We are pretty excited to get Clair integrated into our build processes.
I guess docker registry v2 might support tar format for layer download...
This is good news for me, if true, as we are not operating under the same v1-only constraints. @stuckshut and I were talking about this today, but I can't remember where our conversation landed.
@westlifezs: Yes. However, note that you don't necessarily need to compress the layers. A tarball archive doesn't involve compression by definition.
@mattlorimor @westlifezs We are also open to any suggestion and/or contribution that would make your integration easier.
thanks! just to confirm. Do I need to add any meta data or files into the tarball for each layer or I can just do "wget -O file URL" & "tar cvf file.tar file" and that's it?
I am asking this question as my current response (get layer) for created layers only contains "name" and "IndexedByVersion" without having any vulnerability information included.
@westlifezs
I am asking this question as my current response (get layer) for created layers only contains "name" and "IndexedByVersion" without having any vulnerability information included.
Your call to get the list of vulnerabilities for a layer should look something like this: GET http://localhost:6060/v1/layers/17675ec01494d651e1ccf81dc9cf63959ebfeed4f978fddb1666b6ead008ed52?features&vulnerabilities HTTP/1.1
where 17675ec01494d651e1ccf81dc9cf63959ebfeed4f978fddb1666b6ead008ed52
is replaced with the layer name you wish to get information about. Source
If you forget to put ?features&vulnerabilities
at the end of the URL, you will get no vulnerability information back.
There are a couple other reasons you may see no vulnerability information:
1) If the logs for Clair don't have updater: update finished
in them, then CVE updater has not finished pulling CVE information to store in Clair's database. This can take a while (and I do mean a while). With running Clair locally pointing at an external postgresSQL that exists as an Amazon RDS instance on a t2.micro server (slow everything), it took about eight (8) hours for the CVE information to fully populate. I had tried a couple times to scan some layers before this process was complete only to have Clair tell me it couldn't find any vulnerabilities.
2) You are attempting to scan a layer with an underlying distro that Clair does not support scanning. Currently, Clair pulls CVE information for Ubuntu, Debian, and Red Hat.
@Quentin-M
Just FYI...
I stood up both a Docker v1
Registry and a Docker v2
Registry to test some behavior with Clair. I learned some things:
v1
of Docker Registry.v2
of Docker Registry assuming Clair has unrestricted access to the registry. That can be accomplished by setting the "Path"
in the layer POST
to Clair to be something like this: "Path":"http://192.168.99.100:5000/v2/ubuntu/blobs/sha256:203137e8afd55ac373c62f47e6e7ed6c0f54ed2c7695b864c761242827f29a06"
v1
of Docker Registry. I may look into this further when I have some time (goodbye weekend!).I can only confirm the above answer. It looks like you have a solid knowledge about Clair!
?features
or ?vulnerabilities
, you don't need to specify both.Thanks for this valuable information about Docker v1 / v2 registry protocols compatibility. It might be worth making the change if that's possible to do it in an elegant manner.
As a side note, this is important to understand that the SHA should never be used to fill the Name
field. As documented, layer names should be unique.
Let's say we use SHA as layer names. Consider a blob containing an empty filesystem, with ∅
being its SHA. Let's analyze a first image containing an empty layer in the middle:
Layer A --> Layer ∅ --> Layer C
And now a second image:
Layer D --> Layer ∅ --> Layer E --> Layer F
When analyzing the empty layer in the second image, in practice, it will consider that it already analyzed it and won't do anything. The actual history for our second image would become:
Layer A --> Layer ∅ --> Layer E --> Layer F
Layer D (orphan)
Using unique layer names may limit the amount of layer analysis being re-used. In an early version of Clair, an indirection was present to preserve history trees while maximizing layers re-utilization but we determined that the complexity introduced (on both API and database standpoints) wasn't worth it.
In conclusion, layer names should be unique for each image tree. They should probably not be Docker/rkt IDs (especially because those could be forged...) but related to/composed of internal registry IDs.
thank you both for your valuable answers! it is very helpful for me to have a deep understanding of clair. back to my original question: do I need to add any meta data or file into the tar archive in addition to the layer binary data stream?
also, I am currently using docker-compose to deploy clair, I assume that the database will have to be populated from scratch every time, right? therefore I will need to wait at least more than half an hour every time after I run "docker-compose up -d" in order to have a complete vulnerability database, right?
thanks,
back to my original question: do I need to add any meta data or file into the tar archive in addition to the layer binary data stream?
Nope ;-)
I am currently using docker-compose to deploy clair, I assume that the database will have to be populated from scratch every time, right? therefore I will need to wait at least more than half an hour every time after I run "docker-compose up -d" in order to have a complete vulnerability database, right?
Right. You should probably not use this, especially if you need to start/stop Clair often (i.e. if you want to hack it or modify its config). Instead, deploy a more permanent PostgreSQL server (i.e. container and mounted volume) and then deploy Clair.
Good night!
thanks for your clarification!
I tried to download the binary stream data for each target layer and convert it into tar files. I stored these tar files under the root folder of the clair host like "/root/layer_file.tar"
I put the path as "/root/layer_file.tar" and while evaluating the root layer, I use the following request:
POST /v1/layers HTTP/1.1 Host: CLAIR_URL:6060 Cache-Control: no-cache Postman-Token: XXXXX
{ "Layer": { "Name": "XXXXXXX", "Path": "/root/XXXXXX.tar", "ParentName": "", "Format": "Docker" } }
However, it returns the following error saying that the layer could not be found.
{ "Error": { "Message": "could not find layer" } }
am I missing something while making the request? shall I fill the path to the target layer tar file within the clair server as the path? by the way, since the layer is the root layer (with no parent), therefore I left its parent data as blank.
Any suggestions regarding my current issue?
thanks,
I verified that I can both "scp" and "ls" the tar ball within clair system therefore it is obviously not permission issue. I am just wondering is there anything I need to fill in the path to explicitly notify clair that it should find the tar ball within its own file system. thanks.
I even copied the tar ball to /mnt/layers/LAYER_ID/layer.tar as indicated in the instruction. However, I still hit the same error.
However, it returns the following error saying that the layer could not be found.
Nothing seems to be missing in your request. As shown in https://github.com/coreos/clair/blob/e78d076/worker/detectors/data.go#L89, Clair simply tries to open the file and returns that error if it can't.
based on the function you pointed out just now:
func DetectData(path string, format string, toExtract []string, maxFileSize int64) (data map[string][]byte, err error) { var layerReader io.ReadCloser if strings.HasPrefix(path, "http://") || strings.HasPrefix(path, "https://") { r, err := http.Get(path) if err != nil { log.Warningf("could not download layer: %s", err) return nil, ErrCouldNotFindLayer } if math.Floor(float64(r.StatusCode/100)) != 2 { log.Warningf("could not download layer: got status code %d, expected 2XX", r.StatusCode) return nil, ErrCouldNotFindLayer } layerReader = r.Body } else { layerReader, err = os.Open(path) if err != nil { return nil, ErrCouldNotFindLayer } } defer layerReader.Close()
if seems like only http/https link is accepted by clair, right? but in the API example, a local file system path is used for layer archive path. Could I still put the tar file in the local (within clair) file system and to be recognized by clair server?
If local path is no longer supported, then we may want to change the API example in the documentation to a more realistic use case.
thanks,
also if possible, could you please let me know if there is a public accessible layer with its parent information (can be accessed through url) that is vulnerable by clair's definition? The reason I am asking this is because I've tested a number of layers so far. they are either "not found", "could not extract the archive" or without any vulnerability (with only name and IndexedByVersion). I just want to validate that I can use clair successfully detect vulnerabilities at layer level throug clair API (I can validate the server is running correctly with "analyze_local_images" tool though). If there is such an concrete example (with a concrete URL and parent info given), then it will be a lot easier for users like me to validate the correctness of our clair API setup.
Thanks
I confirm clair API works appropriately. The issue comes from docker registry V1. It won't work appropriately If I do "wget -O file URL" & "tar cvf file.tar file" (URL refers to https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer). This is because the layer binary stream data (even after archived as tar files) downloaded from v1 registry is not compatible with clair. We will have to use "docker save" at client side to re-format the layer data in order to make it compatible with clair. I've just tested & verified.
In conclusion, we probably do not want to say clair is "registry agnostic". clair may be fully compatible with docker registry v2 but not v1.
Clair takes tarballs of container layer's filesystems as an input, and gives you vulnerabilities that may affect the container (why and how to fix it if possible) as an output. It's as simple as that. Clair works with rkt images, Docker images, OpenVZ images, or any system tarballs. There is no relation with container registries, this is almost irrelevant. Saying that Clair is compatible (or not) with a container registry X or Y doesn't make sense. However, it is appropriate to say that Clair is compatible with a container image format W or Z (as I just did above actually), or that Clair is compatible with a distribution U or V depending on the vulnerability sources. Note that these are extendable.
I see your point. But like I said, I simply did ""wget -O file URL" & "tar cvf file.tar file" (URL refers to https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer)." but it does not work. Maybe because clair is not compatible with a container image format created through this way. The best approach for registry v1 I found so far is to use "docker pull & docker save" to ensure the format registry v1 images is compatible with clair. Thanks for your clarification.
@westlifezs: I think the confusion lies in how you're handling the data coming out of https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer
. Any Docker V1-compliant registry will return tar or compressed-tar data from that endpoint; there is no need to pipe it via tar
again. Doing so will, in fact, result in a tar of a tar, which Clair will not be able to process properly.
What error are you seeing if you pass the data coming from https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer
directly to Clair?
Also be aware: If you are hitting a private image, sending Clair the above URL will fail, as Clair won't have the necessary auth credentials.
@josephschorr Thanks for your attention. Could you please give me a source/justification regarding your following statement : "Any Docker V1-compliant registry will return tar or compressed-tar data from that endpoint". Because I've heard people told me differently. Also I could not find any information online to support this statement.
If I use "https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer" as the path of the target layer, then I will receive the following error:
{"Error":{"Message":"utils: could not extract the archive"}}
we are also planing to use clair to scan our VMs/images. shall I use it the same way as I am using it for individual layers? I can post archived VM/Snapshot/images then get the results, right? If it does not support img/iso for now. any plan for the future? I think this can be a very helpful feature.
@westlifezs: https://docs.docker.com/v1.6/reference/api/registry_api/ is the spec for the V1 registry API. As you can see in the /layer
call, the response contains the layer binary data. Docker's layer format is almost universally a compressed-tar (there are a few cases where it is not compressed, but I cannot recall any version after 0.6 where that has been the case) See: https://github.com/docker/docker/blob/master/image/spec/v1.md. Not sure who told you otherwise, but I believe they are mistaken, as if the /layer
call is not serving a tar or compressed-tar, Docker would fail to accept it.
As for VM snapshots or images, to use the standard format and endpoint, you'll have to extract the root filesystem in tar format for Clair to support it. Otherwise, you'd have to write a custom format handler.
@josephschorr thanks for the info. However, even if "https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer" is serving a tar it still does not work for clair if the URL is set to path. I tried for a couple of different images under our environment and this is also confirmed by @mattlorimor in his earlier post. Unfortunately we have to accept the fact that Clair cannot pull layers directly from v1 of Docker Registry as of now. The easiest workaround I found to my best knowledge is to use "docker pull & docker save" to ensure the format registry v1 images is compatible with clair.
@westlifezs: Unless your V1 endpoint is returning data differently than all standard registries, I guarantee it is not a V1 vs V2 difference; the layer data served by both is exactly the same: a compressed tar of the root file system. Quay itself serves both V1 and V2, with the same layer data, and sends it directly to Clair, so I know it is not a V1 issue.
Your best approach is to enable debug logging in Clair and see why it is raising that error; I suspect it is due to a problem with the layer URLs you are sending, rather than the data format itself.
@josephschorr if that is the case, then the problem should come from the data rather than the URL itself. This is because if I change the URL to something else, then clair will raise an error saying "{"Error":{"Message":"could not find layer"}}".
The content of the layers should be the same no matter each of them is obtained through "wget -O layer.tar https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer" or untarred layers of a saved image (through "docker pull & docker save"), right?
However, they are actually not the same (same number of layers but their sizes vary significantly) based on my experience. this is why I am kind of doubt their formats are different. Could you confirm their content is the same under your environment?
BTW, our docker registry v1 is the same as standard ones.
thanks
If it hit "{"Error":{"Message":"utils: could not extract the archive"}}", then I assume clair is able to retrieve data from the endpoint. The error suggests data under the URL is not acceptable from clair's perspective, right?
or untarred layers of a saved image (through "docker pull & docker save"), right?
docker save
generates a tar with a different format that the raw image data found in a layer. /layer
returns a single layer, while docker save
returns all layers in an image, including its JSON metadata (which Clair does not use). If you want to compare, the command corresponding to /layer
would be docker export
.
BTW, our docker registry v1 is the same as standard ones.
Then it is definitely returning tar layer data and the problem lies somewhere else. Perhaps Clair is not following a redirect or somesuch; more logging and debugging would help.
actually I was talking about the difference between data returned by "/layer" and "LAYER_ID/layer.tar" (untarred from "docker save"). The reason I use "docker save" for comparison is because after I untar the saved image into a list of tars, they can be successfully accepted by clair. I assume if "/layer" returns same content as one of the layers in the list, then its data should still be considered as valid by clair. But this is not the case as of now. By the way, I use /ancestry API to obtain a list of target image layers under docker registry v1 API.
Looks like I need to dig into clair logs. Do you happen to know an easy way to config logging level for clair? I checked "https://github.com/coreos/clair/blob/master/config/config.go" but could not find anywhere I can modify to adjust logging level for clair.
all of the information above are super helpful. I am leaving the following two questions for people to confirm because based on previous discussion they are still debatable. At least based on my experiments the answers to both of the them are No. I am still working on them. Hopefully they can be cleared soon.
zero
.
I checked with dockyard
implementation, the layer from 'v1/images/:id/layer' is same with 'docker save'. @Quentin-M, @westlifezs, @josephschorr
I'm thinking I was mistaken about Clair not being able to pull data from a Docker Registry v1 server. I've been working on something and am able to have Clair complete a scan even when the path points to our v1 Docker Registry:
{
"Layer":{
"Name":"Test",
"Path":"https://[REGISTRY_URL]/v1/images/[IMAGE_ID]/layer",
"ParentName":"",
"Format":"Docker"
}
}
Kicks back a response:
{
"Layer":{
"Name":"Test",
"Path":"https://[REGISTRY_URL]/v1/images/[IMAGE_ID]/layer",
"Format":"Docker",
"IndexedByVersion":2
}
}
In the case above, there is nothing to report back.
weird, I actually can do "wget https://[REGISTRY_URL]/v1/images/[IMAGE_ID]/layer" with a file called "layer" downloaded. It is actually a tar file with the container file system included. However, if I put the same url it either complains "could not extract the archive" or "could not find layer" (I tested docker at two environments, one is on our private cloud VM another is on my local VM, they gave me different errors but I can download the "layer" file correctly under both environments. ) On the other side, if I save the file "layer" as "layer.tar" and put it under a downloadable link (assigned as path) then I can post the layer without any error popped. any suggestions?
the only difference I think of the two locations (URLs) is one with non-default port (8080) and the other one uses default port (80 or 443). The one with 8080 seems unreachable by clair but reachable through wget. does clair put any restrictions regarding which port I should use in the path url? I could not find any other reasons why this is happening.
The one with 8080 seems unreachable by clair but reachable through wget
It doesn't make much sense, to be honest. You're probably looking at the wrong place.
does clair put any restrictions regarding which port I should use in the path url? I could not find any other reasons why this is happening.
No it doesn't. See data.go#L76.
i think this is probably caused by of our local setup. anyway, thanks all for the great answers.
Hi,
In #127 , you mentioned that I should put a http url for clair to access the target layer archive. I used "GET /v1/images/(image_id)/layer" as described in docker registry v1 api (indicating this is the right way of getting layer binary data stream). I tested and I can successfully obtain binary data of the target layer. Therefore I set "https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer" as the path to post for clair post, am I right? The URL is exactly the same I used to download layer binary data.
However, I encountered the same error as described in the previous 307 issue (#127). Could you please let me know if I am using the right path? or this is caused by some other reasons?
The data I posted (post_layer.txt)
{ "Layer": { "Name": LAYER_NAME, "Path": https://REGISTRY_URL:8080/v1/images/LAYER_ID/layer, "ParentName": PARENT_NAME, "Format": "Docker" } }
Curl request & response:
curl -v -X POST -d @post_layer.txt http://localhost:6060/v1/layers/
Shall I use http instead of https? Our environment only support https right now. Could you let me know where/how I can possibly debug this issue? thanks