Closed airtonix closed 7 years ago
Hi there!
We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.
The current status is as follows:
This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.
Do your web
nodes configure a peer URL and external URL?
Bingo. Been looking for this. I, too, am running through the tutorial, and got the same issue when trying to pass -i
inputs. Running execute
tasks without command-line inputs is fine, but it fails on this.
Do your web nodes configure a peer URL and external URL?
I have it all running as docker containers on a single node (Docker for Mac FWIW), just used the default docker-compose.yml
from the "Installing" guide on http://concourse.ci
So, no, didn't configure EXTERNAL_URL
, because didn't think it mattered, all running locally. Does it?
Now I am more confused than ever. I did try setting CONCOURSE_EXTERNAL_URL
to https://127.0.0.1:8080
(although the docs say not to), just to set it to something. Of course the job fails:
executing build 1
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection refused
gzip: invalid magic
tar: Child returned status 1
tar: Error is not recoverable: exiting now
exit status 2
I try unsetting it, all jobs hang on initializing
.
To a large degree, the problem is that it is unclear what the purpose of CONCOURSE_EXTERNAL_URL
is and how to work with it in various scenarios.
Based on the config, it looks like workers connect via CONCOURSE_TSA_HOST
, so it isn't for the worker. What exactly is its purpose, and how do you handle it in various scenarios? The docs say:
It can't be 127.0.0.1 or localhost as it has to also work in a separate network namespace, for execute to be able to receive your uploaded bits from the web node.
OK, so it isn't for external access, since execute
is using it (from the worker node, to "receive uploaded bits"?). But sometimes the external URL is 127.0.0.1
, e.g. port-mapped docker-compose run.
I am very confused.
OK, ignore the initializing
problem. That is something else, unable to connect to the docker registry. I will open a separate issue.
The URL would need to be the IP as seen from the fly client I suppose?
On Tue., 20 Dec. 2016, 22:29 Avi Deitcher, notifications@github.com wrote:
OK, ignore the initializing problem. That is something else, unable to connect to the docker registry. I will open a separate issue.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/concourse/fly/issues/138#issuecomment-268226022, or mute the thread https://github.com/notifications/unsubscribe-auth/AADvKcCsW6gMDrd_lusFYWJ3UrRFVfNAks5rJ8M-gaJpZM4LNmNE .
The URL would need to be the IP as seen from the fly client I suppose?
Based on the name, that would make sense. But what I don't get is:
@deitch It needs to be an address that the workers can reach. The way execute
gets the inputs to the workers is to have fly
upload them to the ATC's "pipe" API, and pass a download URL along to the workers. So the workers need to be able to reach it.
We default it to 127.0.0.1
mainly for development purposes as it's also used for aspects of the web UI where it doesn't matter as all traffic will be from the browser. That'll never work for execute
though, so admittedly it's not a great first-time experience, but there's not really a sane default for it imo, beyond guessing your network stack.
You'll want to set it to your machine's LAN IP, usually 172.x.x.x
for Docker or 10.x.x.x
for e.g. a binary in an private network (e.g. AWS VPC). Or, if you have a domain set up (maybe pointed to a load balancer), configure that as the external URL. Ours, for example, is https://ci.concourse.ci
.
@vito thanks for the response. Let me see if I get this.
execute
a job with some input context using fly
fly
client zips up the input context and loads it via the API. Now it is available on the ATC nodeThat URL in the last step is the one that is defined by the CONCOURSE_EXTERNAL_URL
.
Is that right?
@deitch Yep. Technically the ATC and worker don't know the specifics of any of this; fly
puts together a build plan that just has the read end of the pipe configured as the download URL for the get
of the archive resource.
Got it. So CONCOURSE_EXTERNAL_URL
really means, "url at which workers can access the ATC".
And if I understand you correctly, it is fly
that puts together the plan that is just transmitted by the ATC to the workers.
May I raise a number of issues?
CONCOURSE_ATC_WORKER_URL
or something like that. It has little to do with external, but lots to do with internal (i.e. worker access).FWIW, kubernetes went through something similar with certain parts of kubectl. It had a lot of logic embedded in it, where it made the detailed API calls, which in turn meant things like stopped communications broke midstream, couldn't "fire and forget", and changes got messy. They moved logic into the server so it became a simple standard API call (kubectl became lighter API wrapper), and things were dramatically simpler.
Just $0.02...
No, it does not mean that; it's used for much more than just worker communication. It means precisely "URL at which anything can reach the ATC(s)".
fly
can link to an auth endpoint without knowing particulars about the ATC API), the pagination API (Link
headers), the pipes API (so fly
can just blindly pass along a download URL). They all have the same generic need for the externally-reachable URL. This is also a pretty common thing to configure in other applications.If you have an external load balancer, that's exactly what you should configure as the external URL
Workers that are inside a private subnet should try to reach another endpoint in the same private subnet via an external load balancer in a public subnet that is accessible only from the Internet?
It's named exactly what it should be for the multitude of things it's used for: resource metadata (so notifications can link to their own builds), the auth API (so fly can link to an auth endpoint without knowing particulars about the ATC API), the pagination API (Link headers), the pipes API (so fly can just blindly pass along a download URL). They all have the same generic need for the externally-reachable URL.
That's the part I don't get. Why should a client need to know anything other than the single URL to access a service? If I use kubectl, all I need is the k8s master URL; if I use Rancher, all I use is the Rancher URL; if I connect to Salesforce, all I need is the Salesforce URL.
This is also a pretty common thing to configure in other applications I haven't seen it (which may speak more to what I have seen). Usually I see a single API endpoint to which a CLI communicates over an API. Everything else is managed by the service endpoint.
Is there an architectural and services diagram that shows all of these components, their purposes, what they use and how they communicate?
Workers that are inside a private subnet should try to reach another endpoint in the same private subnet via an external load balancer in a public subnet that is accessible only from the Internet?
Yes, because otherwise you have to allow your containers to call into your private network, which is a terrible idea if you're running things like pull-requests (i.e. untrusted code) or are running a multi-tenant Concourse instance. It would also require more internal plumbing to not just use the external URL, as suddenly you may need to pass around all internal IPs of the cluster rather than just a single URL pointing to a load balancer. Given that we already need the external URL configured not just for this use case, I really don't see why we would want a second value to configure.
That's the part I don't get. Why should a client need to know anything other than the single URL to access a service? If I use kubectl, all I need is the k8s master URL; if I use Rancher, all I use is the Rancher URL; if I connect to Salesforce, all I need is the Salesforce URL.
I don't get what you mean. The external URL is the exact value the client would use to access the service. It is a single URL.
I haven't seen it (which may speak more to what I have seen). Usually I see a single API endpoint to which a CLI communicates over an API. Everything else is managed by the service endpoint.
Grafana, for example, requires you to configure the external URL for oAuth callbacks. Same goes for Concourse. The same value happens used in many scenarios, so we gave it a single config point with a generic name.
Is there an architectural and services diagram that shows all of these components, their purposes, what they use and how they communicate?
There's https://www.gliffy.com/go/publish/10463597 which hasn't been integrated into the docs yet. But this topic may be difficult to work into a general architecture diagram as the components don't strictly know that they're talking to each other in the fly execute
case. That may be better covered by its own document.
Yes, because otherwise you have to allow your containers to call into your private network, which is a terrible idea if you're running things like pull-requests
So the whole purpose here is the internal runc
containers and their access to the context?
That is a good start. It looks like Garden runs the containers, and Baggageclaim downloads the context(s) and mounts them as volumes into the containers. So is it Baggageclaim that needs the API for access?
This is a good start, but would be great to see a lot of the flows and how the parts interact.
So the whole purpose here is the internal
runc
containers and their access to the context?
Yep. All network traffic in Concourse happens in containers; Baggageclaim or Garden or ATC never call out to e.g. GitHub themselves.
That is a good start. It looks like Garden runs the containers, and Baggageclaim downloads the context(s) and mounts them as volumes into the containers. So is it Baggageclaim that needs the API for access?
Baggageclaim is about as semantically smart as Garden, that is to say it isn't - ATC is just driving them both. It'll e.g. talk to Baggageclaim to create a volume, and then talk to Garden to mount that volume into a container, and then use that container to download stuff into the volume.
That container is then being told, "run /opt/resource/in
with this JSON resource request", which then tells the archive
resource to download something, in this case <external url>/api/v1/pipes/sdfkjhadskjhdsf
, as fly
told it to (after creating the pipe and starting to upload to its write end).
The architectural design here really is interesting. It is a pity there isn't a full write-up (well, wasn't, until you started describing it here :-) ).
So the process is something like:
fly
tells ATC to create a job with a given contextfly
zips up the context and uploads to ATCrunc
container to run the jobI don't actually get why you need the volume, since data all is ephemeral and will be thrown out after the task anyways, but it doesn't affect the meat of this discussion.
The part that still seems strange to me is to require a fixed and external URL for the container to access the zipped context. Since it basically is downloading a single zip file (or perhaps multiple), but using a known and controlled protocol (http), and that is entirely in the contract between ATC<->Garden<->runc
container, I can see several other ways to do it:
POST
or PUT
chunks of data.My main concern is that the current structure is not maintaining separation between the external (fly
, its config, and its communications channel with the API endpoint, which happens to be ATC), and internal (Garden, Baggageclaim, runc
containers, volumes, and their communications channels with ATC). That, in turn, makes it brittle, e.g. when your IP actually is 127.0.0.1
, and requires extra config, which makes it hard to move around and creates barriers to adoption.
Think about it. If I have concourse installed to run locally in a docker container, I would port-map ATC so I know fly always can reach it at, say, http://127.0.0.1:8555/
or similar, and save that in my config. But then my CONCOURSE_EXTERNAL_URL
will change every time I set up and tear down?
I actually think this is one of the best process-structured CI/CD I have seen in a long time. I came to it because I needed a system for a proof-of-concept for a new cloud migration. They wanted Jenkins, some people wanted CircleCI or Travis, etc. Architect friend recommended this, I loved it. Pipelines are great, everything makes sense, and I love how you use runc
containers making it actually work even if the workers are docker containers. Brilliant! Sure, it still needs a larger ecosystem, but give it time (and some marketing love).
But the whole comms flow that depends on this var and its setting and its network sensitivity made running local via compose v1 and compose v2 then rancher than k8s and all of the various networking structures difficult for them to swallow.
So, yeah, not critical, trying to be helpful. :-)
Yep, except for step 7 it'd actually be a second container that runs the task, using a copy-on-write volume of the volume the container in step 6 fetched into.
This is all in service of having everything speak one language: build plans. One way to look at it is that there are two consumers of the build plan API: fly execute
and pipelines. There may be other use cases to have other tools generate build plans and submit them as one-offs to the ATC API. That is to say, it's an important architectural decision that fly execute
is not a special case, and that the ATC API is expressive enough to support it among future potential clients.
To that end, part of the mental model for build plans is that every artifact that is provided to the build from an external source is via a get
. The user's bits are an external source (it's on their laptop). All get
steps fetch using resources, so fly
uses the archive
resource, which just does a simple curl
and tar zxf
of some URL it's given. A URL is a good generic starting point, but let's work backwards from there - how do we get a URL for the bits the user has on their local disk? We also want fancy things like progress bars so they're not just sitting there waiting.
The ATC provides a generic /api/v1/pipes
endpoint for creating one-off endpoints for exactly this reason (basically what you said in point 1 in your suggestions). It's a simple API: create a pipe, get a GUID, PUT
to one end, GET
to the other, and it streams. So the archive
resource is told to curl
the read end, and you get a progress bar, and the bits end up in the container.
The key point here is that none of this is specialized for fly execute
. Your steps 1-7 are the exact same steps we perform for any resource. Modeling it this way keeps the internals generic, the API consistent, and the client is pretty simple. It also works even if your workers live in a private network. Remember, there's no guarantee that your workers can necessarily reach the ATC by its internal IP. It may be a Mac Mini running on your office desk for running iOS testing, registered as a worker with a cluster running in AWS or GCP.
We could follow suggestion 3, but that would require some special-casing and some additional complexity in the build plans API, which I'd like to avoid if we can help it. IMO we can help it since, again, we already need the external URL for various other reasons.
Also no worries. I've been wanting to do a blog or something about "anatomy of fly execute
". It's always fun to tell people that the progress indicator they see is actually the other end downloading from them, and technically not an upload progress bar. :)
Also no worries. I've been wanting to do a blog or something about "anatomy of fly execute". It's always fun to tell people that the progress indicator they see is actually the other end downloading from them, and technically not an upload progress bar. :)
Yes! I imagine half of my questions would disappear if that had existed (probably to be replaced by others...). I hope you do collate these along with the diagram and some flow diagrams to make such a solid series of posts.
To the subject, though...
Yep, except for step 7 it'd actually be a second container that runs the task, using a copy-on-write volume of the volume the container in step 6 fetched into.
OK, makes sense.
an important architectural decision that fly execute is not a special case, and that the ATC API is expressive enough to support it among future potential clients
No argument there. I'm a huge proponent of clean APIs, and the clients - even shipped with product - are nothing special.
It's a simple API: create a pipe, get a GUID, PUT to one end, GET to the other, and it streams.
So ATC's pipes API really is just a proxy, a streaming pipe? client -> ATC -> container (to volume) as a single stream? Or is it really 2-step: client create .tar.gz
and uploaded; container pulls?
Remember, there's no guarantee that your workers can necessarily reach the ATC by its internal IP.
Got that. But there is a guarantee that the ATC can reach the workers (or it couldn't start jobs, get results, etc.), which is why I thought options 2 or 3 or any variation with push made sense.
We could follow suggestion 3, but that would require some special-casing and some additional complexity in the build plans API
Because the same API is used by the fly
client and the pipeline, so we would need a way to tell it, "I am pushing to you"?
Still unclear about something. I think you are saying:
fly
initiates all of its communication with ATC because it is a client, using the build plan APIIs that about right?
Going back to my earlier comment, flow diagrams with steps and API calls would be incredibly helpful. As you said, there is a "mental model" here, but it isn't instantly visible.
In any case, if so, would option 3 would become something like:
fly
upload to ATCCall it pre-loading?
So ATC's pipes API really is just a proxy, a streaming pipe? client -> ATC -> container (to volume) as a single stream? Or is it really 2-step: client create .tar.gz and uploaded; container pulls?
Yep, the pipes API is a streaming pipe. There's no "upload then download", it's just bits flow straight through. So there's no state to it, either.
pipelines, however, have to be told, "go execute this", so ATC instantiates that process via Garden's http API, telling a worker, "go start a pipeline".
The workers don't know anything about pipelines or builds - they just run volumes and containers, and are generic components usable outside Concourse. Garden for example us used by Cloud Foundry for running web apps. ATC is the brains behind it all.
There's also no such thing as really "starting a pipeline" - a pipeline is just a set of jobs, which describe a plan for builds to run, describing their dependencies (get
) and when to run them (trigger: true
).
Technically, the ATC is talking to itself - it's the one also scheduling pipelines and creating those build plans. But the point is at the end of the day there's a concrete "build plan" to run, which is the same "build plan" that fly execute
constructs and submits to the ATC API.
In any case, if so, would option 3 would become something like: (snipped)
It'd probably entail introducing a new "get from pipe" step wherein the ATC would just create a volume and stream the pipe into it. But at that point we'd probably replace the 'pipe' API with something else that's more integrated with the flow.
it's just bits flow straight through
Hence the name, "pipes". :-)
The workers don't know anything about pipelines or builds - they just run volumes and containers ... ... Technically, the ATC is talking to itself ...
So what tells Garden to create a new container, or Baggageclaim to create a volume? How is it given the parameters of what tasks to run, what binaries it needs to run them? How is it told to download the data from the pipe?
So what tells Garden to create a new container, or Baggageclaim to create a volume? How is it given the parameters of what tasks to run, what binaries it needs to run them? How is it told to download the data from the pipe?
ATC. It's given them via the /api/v1/builds
endpoint that Fly submits a build plan to. Fly's build plan includes a get
of type archive
with the URL as the read end of the pipe.
I think I need to try and find time to dig into it. I need to understand the parts (that diagram helps), the flows, and where that URL is used.
Moved to concourse/concourse#1102
Going through the tutorial, I assumed I could transpose my own concourse server (running on coreos via rancher):