Closed robsonpeixoto closed 5 years ago
Hi @robsonpeixoto,
We're only using marathon-lb as a L7 load-balancer with HTTP 1. I've been meaning to try set up a L4 load-balancer but we haven't got there yet. So I'm not sure I can help that much but I can maybe answer some of these questions...
Is everyone running marathon in a docker container?
Not marathon, no. Did you mean marathon-lb? marathon-lb we run in a container.
Is possible to run outside docker container?
Running marathon-lb outside a container should work but some functionality may break. The Lua scripts used for some of the API endpoints are designed with the assumption that only one process called "haproxy" is present on the system.
What's the docker version are you using? And Storage Driver?
Currently have marathon-lb running on Docker 1.11.2 (overlay), 1.12.1 (aufs), and 1.13.1 (overlay2) on various versions of DC/OS and standalone Marathon/Mesos. Docker hasn't really been an issue for us with marathon-lb.
The only other thing I can point you to is this repo: https://github.com/praekeltfoundation/docker-marathon-lb where we override some of the default templates. Important changes include:
@JayH5 We were able to upgrade HAProxy's HTTP checks from 1.0 to 1.1 by setting the following labels in our service definitions:
...
"labels": {
"HAPROXY_0_BACKEND_HTTP_HEALTHCHECK_OPTIONS": " option httpchk GET {healthCheckPath} HTTP/1.1\\r\\nHost:\\ www\r\n",
"HAPROXY_0_BACKEND_HTTP_OPTIONS": ""
},
...
A few notes regarding the specific formatting:
HAPROXY_0_BACKEND_HTTP_HEALTHCHECK_OPTIONS
has an intentional leading two spaces.HAPROXY_0_BACKEND_HTTP_OPTIONS
with an empty-string value is intentional to fix a newline issue when HAPROXY_0_BACKEND_HTTP_HEALTHCHECK_OPTIONS
is overridden.Tested With:
FWIW: I need to dig into the marathon-lb code and issue a Pull Request - we have 80 services deployed and this has become a lame patch we add to every service that wants a 1.1 health check.
Thanks @JayH5
As the method of launching and restarting has changed with v1.12, it should be better about random pauses and slow config updates. Although if a client has a long running connection open, that will still block it from reloading (I've seen some instances where it takes up to an hour to drain all the connections from the old processes).
Hi guys! I'm using the stack Mesos + Marathon + Marathon-lb but we are getting some troubles :(
For some unknown reason, the marathon-lb are showing these problem:
And I have some question:
I'll try to reduce the number of old haproxy process using the below solution. What your opinion?
We are using marathon-lb as as L4 and L7 load balance. But for each created task(deploy/task failure/...) the marathon-lb will create a new configuration version and will reload haproxy. But it keep old process running waiting to all ports close.
To avoid this problem I'll create two marathon-lb instances. One instance to http(L7) services and another one to tcp(L4) ((redis/thrift/...)) services.
On the L7 LB(load balancer) I'll use this script to kill old process in a cron running every 5 minutes. It will ensure that all haproxy hold by a websocket connection will be killed.
As our tcp service are very stable and has few deploys, it will not be affect other apps problems.
Any suggestion how to make it works better?
Some of my server info server info:
/usr/sbin/mesos-master --cluster=jusbrasil-mesos-prod --log_dir=/var/log/mesos --logging_level=INFO --port=5050 --quorum=2 --work_dir=/tmp/mesos --zk=zk://zk-1:2181,zk-2:2181,zk-3:2181/mesos_20160425
/usr/sbin/mesos-slave --advertise_ip=10.0.0.1 --cgroups_enable_cfs --cgroups_hierarchy=/cgroup --containerizers=docker,mesos --docker_stop_timeout=50secs --executor_registration_timeout=10mins --executor_shutdown_grace_period=60secs --isolation=cgroups/cpu,cgroups/mem --log_dir=/var/log/mesos --logging_level=INFO --master=zk://zk-1:2181,zk-2:2181,zk-3:2181/mesos_20160425 --port=5051 --recover=reconnect --strict --no-switch_user --work_dir=/tmp/mesos
Marathon: marathon-1.1.7
java -Xms512m -Xmx2048m -server -jar /opt/marathon/marathon-1.1.7/target/scala-2.11/marathon-assembly-1.1.7.jar --enable_features task_killing --event_subscriber http_callback --master zk://zk-1:2181,zk-2:2181,zk-3:2181/mesos_20160425 --task_launch_timeout 600000 --task_lost_expunge_gc 75000 --task_lost_expunge_initial_delay 300000 --task_lost_expunge_interval 30000 --zk zk://zk-1:2181,zk-2:2181,zk-3:2181/marathon_20160425
Marathon-lb: v1.6.0
sse -m http://mesos-1:8080 http://mesos-2:8080 http://mesos-3:8080 --group external --group internal --syslog-socket /dev/log
Thanks