tendrilinc / marathon-autoscaler

A simple autoscaler for Marathon applications
https://hub.docker.com/r/tendril/marathon-autoscaler/
Apache License 2.0
41 stars 16 forks source link

'Latest' and 'fix_24' atoscalers are not working (DCOS 1.9.0 and Marathon 1.4.2) #36

Open fernandrone opened 7 years ago

fernandrone commented 7 years ago

Hello,

So, the marathon-autoscaler containers version latest and fix_24 were not able to identify applications with the use_marathon_autoscaler label for a DCOS 1.9.0 / Marathon 1.4.2 installation.

Unfortunately I'm having trouble retrieving some of the original logs, but on INFO it would just print this:

| INFO | Stats differentials collected.
| INFO | Decision process beginning.
| INFO | Decisions are completed.

Activating DEBUG level showed that it could communicate with the nodes just fine:

2017-06-30 19:06:48,227 | DEBUG | (u'10.5.5.101', [{u'source': u'maintenance_lb_external.17b2ce46-4176-11e7-a377-92d74b0bec98', u'executor_id': u'maintenance_lb_external.17b2ce46-4176-11e7-a377-92d74b0bec98', u'statistics': {u'cpus_nr_throttled': 0, u'timestamp': 1498849608.21988, u'cpus_throttled_time_secs': 0.0, u'cpus_user_time_secs': 1609.07, u'mem_rss_bytes': 38350848, u'mem_limit_bytes': 1107296256, u'cpus_system_time_secs': 10820.05, u'cpus_nr_periods': 0, u'cpus_limit': 1.1}, u'framework_id': u'41da1a1e-5d43-4c01-9f60-6a2d9d9e9745-0000', u'executor_name': u'Command Executor (Task: maintenance_lb_external.17b2ce46-4176-11e7-a377-92d74b0bec98) (Command: NO EXECUTABLE)'}])

However, downgrading to fix_23 fixed the issue.

2017-07-03 13:53:46,832 | INFO | Decision process beginning.
2017-07-03 13:53:46,833 | INFO | thumbor/core: metrics: {'mem': 10.408289292279411, 'cpu': 0.3179427788440855}
2017-07-03 13:53:46,833 | INFO | thumbor/core: last_triggered_rule set to: [{'ruleInfo': {'rulePart': None, 'ruleName': u'slowscaledown'}, 'ruleValue': {'scale_factor': u'-1', 'weight': 1.0, 'threshold': {'val': u'20', 'op': u'<'}, 'metric': u'cpu', 'tolerance': u'PT1M', 'backoff': u'PT1M'}}]
2017-07-03 13:53:46,833 | INFO | thumbor/core: vote: -1 ; scale_factor requested: -1
2017-07-03 13:53:46,834 | INFO | thumbor/core: application ready: True
2017-07-03 13:53:46,834 | INFO | thumbor/core: instances: min:1, running:1, max:16
2017-07-03 13:53:46,834 | INFO | thumbor_core: tolerance window filled: True / 13:52:46.834477
2017-07-03 13:53:46,835 | INFO | thumbor_core: tolerance reached: True / 13:52:46.834477 - 13:53:46.834477
2017-07-03 13:53:46,835 | INFO | thumbor_core: within backoff window: True / 13:52:46.835363 - 13:53:46.835363
2017-07-03 13:53:46,836 | INFO | Decisions are completed.

Note that the only change I did was downgrading the container.

Any ideas what could be the issue? I'm fine with using an older version but I thought you may want to look at this.

If you need any more information please let me know!