Open binarylogic opened 4 years ago
@ktff just pinging you on this. If you could provide setup documentation I can start to work on this. Just a simple step-by-step instruction like you did for Kubernetes.
For ECS, simplest/most general deployment that we currently support is that of a sidecar container with splunk_hec
source.
This is a guide on how to collect logs from containers in a single AWS ECS task for Fargate. This is achieved by adding Vector as a container to your task definition and redirecting logs of your containers collected by Docker to Vector. And to achieve that, we are going to transport the logs using splunk
protocol over the loopback network interface (localhost).
We can do that by adding configuration in two places:
In Vector configuration file add splunk_hec
source. This source will receive logs from your containers in the same task. So you will have:
[sources.my_source_id] type = "splunk_hec"
In ECS task definition, two things need to be achieved:
All your containers should have:
logConfiguration
parameter with following content:logDriver
should be splunk
options
should have at least following content:
splunk-url
should be http://0.0.0.0:8088
splunk-token
can have any value.This will configure your containers to use Docker's splunk
log driver which will send container's logs to Vector container.
dependsOn
parameter with following content:
containerName
should be vector
.condition
should be HEALTHY
.This will postpone starting your containers until Vector is ready to accept logs.
As json
this would look like:
"logConfiguration": {
"logDriver": "splunk",
"options": {
"splunk-url": "http://0.0.0.0:8088",
"splunk-token": ""
}
},
"dependsOn": [
{
"containerName": "vector",
"condition": "HEALTHY"
}
]
Container with Vector should have:
name
should be vector
. healthCheck
parameter with following content:
command
array should have two items in given order:
CMD-SHELL
curl -f http://0.0.0.0:8088/services/collector/health || exit 1
This command will check if splunk_hec
source is running.
As json
this would look like:
"name": "vector", "healthCheck": { "command": [ "CMD-SHELL", "curl -f http://0.0.0.0:8088/services/collector/health || exit 1" ], }
That's all of the necessary configuration.
The only thing remaining is to deploy your task how you see fit.
Vector's memory and processor usage and recommended limitations can be found at https://vector.dev/docs/setup/deployment/roles/agent#system-configuration.
Configuration file needs to be accessible to Vector. One of easier ways to achieve that is to build an image with Vector and it's configuration file on default config path.
For debugging purposes, one way is to log Vector with awslogs
logDriver
. With that, you will be able to debug configuration errors.
@binarylogic This guide covers both EC2 and Fargate. Although it only focuses on what needs to be configured as there are two main ways of deploying on ECS. Through console and through the website. Both of them could be covered in a separate guides/sections. This guide is their shared part, and should be enough for those who already know how to deploy containers on ECS.
The guide requires timberio/vector#1784.
Thanks! At first glance, this looks good.
Container dependency configuration has been added.
This guide covers both EC2 and Fargate
scratch that. Guides could be simpler if they are specialized. So the original one is for Fargate, and I'll add a separate one for EC2.
@ktff does vector get deployed as a sort of daemonset? How do we know that it will always exist on 0.0.0.0:8080
?
@LucioFranco
does vector get deployed as a sort of daemonset?
Vector is deployed as a regular container.
How do we know that it will always exist on 0.0.0.0:8080?
User containers are configured to wait for splunk_hec
source to become available on 0.0.0.0:8080
before they start running.
@ktff Is there any networking mode that needs be configured? I tested the setup you described (just used port 8080 instead of 8088), and I am seeing:
ResourceInitializationError: failed to validate logger args: Options http://0.0.0.0:8080/services/collector/event/1.0: dial tcp 0.0.0.0:8080: connect: connection refused : exit status 1
I am using timberio/vector:0.10.0-debian
as vector image
@awangc did you add
address = "0.0.0.0:8080"
to configuration of splunk_hec
. If that address isn't specified then the source will use 8088
port by default.
Regarding networking mode, on Fargate only supported mode is awsvpc
so the guide assumes that. EC2 is a slightly different story.
For EC2 with different networking mode you will need to ensure that the port of Vector container is accessible from your containers which would possibly require changing contents of Vector's portMappings
. So the container would end up with something like this
"portMappings": [
{
"hostPort": 8080,
"protocol": "tcp",
"containerPort": 8080
}
]
@ktff Yes, I have that part in the vector.toml file:
[sources.app_log]
type = "splunk_hec"
address = "0.0.0.0:8080"
token = "aabbccddeeff"
Also I have EXPOSE 8080
in my Dockerfile and I'm testing in Fargate, thanks
@ktff I found out the setup does not work for Fargate platform 1.4 (which is the one I had been trying) but works for LATEST
(1.3 if I'm not mistaken). Possible reason is that container start order is not being respected https://github.com/aws/containers-roadmap/issues/849 ?
Possible reason is that container start order is not being respected
@awangc Yes, that explains the error message. The app container logger is trying to establish connection before starting app container, and if the vector container isn't running at that time, there is nothing listening. It also seams that they are trying to connect only once, at the time of the start. In that case there are two options:
@ktff I found out the setup does not work for Fargate platform 1.4 (which is the one I had been trying) but works for
LATEST
(1.3 if I'm not mistaken). Possible reason is that container start order is not being respected aws/containers-roadmap#849 ?
Thank you for your insight, I have the same issue - did you manage to work around this on 1.4?
Any updates here? Thanks!
Found this coming from the internet while working on getting it setup. Not sure if everything works, but I was able to get Vector setup on the LATEST Fargate (1.4) without issue using File source and cloudwatch sink.
I'm trying to make it the container stops working
By Adding splunk-verify-connection": "false" it worked for me in AWS Fargate and Ec2.
logConfiguration": { "logDriver": "splunk", "options": { "splunk-url": "http://0.0.0.0:8088", "splunk-verify-connection": "false", "splunk-token": "abc1234567890" } }
I'm opening this issue to represent a single, final, place for AWS ECS documentation. This issue will be used to build out the website, docs, and marketing pages.