nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
637 stars 116 forks source link

Flintrock slows down with > 100 hosts #192

Open douglaz opened 7 years ago

douglaz commented 7 years ago

2017-03-27 22:40:22,145 - flintrock.ec2 - INFO - Requesting 161 spot instances at a max price of $0.19... 2017-03-27 22:40:22,886 - flintrock.ec2 - INFO - 0 of 161 instances granted. Waiting... 2017-03-27 22:40:53,678 - flintrock.ec2 - INFO - All 161 instances granted. 2017-03-27 22:41:16,518 - flintrock.ec2 - DEBUG - 161 instances not in state 'running': 'i-00beffce9b2f34a83', 'i-09bf7302531a74bc0', 'i-0f8f257d76fba4d25', ... 2017-03-27 22:41:21,214 - flintrock.ec2 - DEBUG - 88 instances not in state 'running': 'i-00beffce9b2f34a83', 'i-0da34a925277b3ba6', 'i-0383e6663517100c3', ... 2017-03-27 22:41:25,778 - flintrock.ec2 - DEBUG - 79 instances not in state 'running': 'i-00beffce9b2f34a83', 'i-0da34a925277b3ba6', 'i-0383e6663517100c3', ... 2017-03-27 22:41:30,949 - flintrock.ec2 - DEBUG - 70 instances not in state 'running': 'i-00beffce9b2f34a83', 'i-0da34a925277b3ba6', 'i-0383e6663517100c3', ... 2017-03-27 22:41:36,750 - flintrock.ec2 - DEBUG - 35 instances not in state 'running': 'i-00beffce9b2f34a83', 'i-0da34a925277b3ba6', 'i-0d1f463dc38eb9a9a', ... 2017-03-27 22:41:41,404 - flintrock.ec2 - DEBUG - 8 instances not in state 'running': 'i-015d27ea45d473f22', 'i-0da34a925277b3ba6', 'i-097c0ffdcf25e8828', ... 2017-03-27 22:41:45,817 - flintrock.ec2 - DEBUG - 6 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:41:49,765 - flintrock.ec2 - DEBUG - 6 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:41:54,524 - flintrock.ec2 - DEBUG - 5 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:41:58,794 - flintrock.ec2 - DEBUG - 5 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:42:03,388 - flintrock.ec2 - DEBUG - 5 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:42:07,566 - flintrock.ec2 - DEBUG - 4 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:42:11,927 - flintrock.ec2 - DEBUG - 4 instances not in state 'running': 'i-00dd830c5fdfee97f', 'i-097c0ffdcf25e8828', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:42:16,191 - flintrock.ec2 - DEBUG - 3 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:46:07,473 - flintrock.ssh - INFO - [52.91.3.244] SSH online. 2017-03-27 22:46:10,601 - flintrock.ssh - INFO - [34.203.212.197] SSH online. 2017-03-27 22:46:10,745 - flintrock.core - INFO - [52.91.3.244] Configuring ephemeral storage... 2017-03-27 22:46:12,289 - flintrock.ssh - INFO - [54.82.182.128] SSH online. 2017-03-27 22:46:12,748 - flintrock.ssh - INFO - [34.207.234.68] SSH online. 2017-03-27 22:46:12,867 - flintrock.services - INFO - [52.91.3.244] Installing HDFS... 2017-03-27 22:46:13,652 - flintrock.ssh - INFO - [54.204.113.16] SSH online.

See the almost 4 minutes gap between instance launch and the first ssh: 2017-03-27 22:42:16,191 - flintrock.ec2 - DEBUG - 3 instances not in state 'running': 'i-097c0ffdcf25e8828', 'i-00dd830c5fdfee97f', 'i-032b1adcd7e1065d8', ... 2017-03-27 22:46:07,473 - flintrock.ssh - INFO - [52.91.3.244] SSH online.

Using lsof -n I see the number of files/connections constantly increasing (reaching more than 50k) and a high CPU usage during this time.