mesosphere / marathon

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
https://mesosphere.github.io/marathon/
Apache License 2.0
4.07k stars 845 forks source link

Better document scaling limits and impact of settings #2954

Closed aameek closed 7 years ago

aameek commented 8 years ago

Marathon should do a better job of documenting its scaling limits and what settings impact those scaling limits. We have observed two issues in a span of two months:

  1. [v 0.9] ZK node size limit: Given Marathon writes the entire task info into ZK and everything goes into a single node, depending upon the application json it can hit the limit at an arbitrary number of apps. With compression on, it buys more time, but will hit the limit again.
  2. [v 0.11] # open file descriptors: If number of tasks (with HTTP heathchecks) grow and master node doesnt have a high enough ulimit, it can get into a bad state. We even observed OOM errors - unsure if related to this(?)

Marathon documentation has a high level notion about supporting 5K apps. But it should be more detailed. Both 1) & 2) above were observed with less than 1K apps.

aquamatthias commented 8 years ago

@aameek I agree. Thanks for reporting this.

meichstedt commented 7 years ago

Note: This issue has been migrated to https://jira.mesosphere.com/browse/MARATHON-2982. For more information see https://groups.google.com/forum/#!topic/marathon-framework/khtvf-ifnp8.

meichstedt commented 7 years ago

Note: This issue has been migrated to https://jira.mesosphere.com/browse/MARATHON-2982. For more information see https://groups.google.com/forum/#!topic/marathon-framework/khtvf-ifnp8.