Closed edwardcallahan closed 7 years ago
As mentioned on Slack, there are no resourceAvailable
records for the web
-roled agents. The CLI needs to handle that.
The fact that there are no resources available probably means that anything web
role based cannot be scheduled to run. However, resources should become available eventually.
Restarting two agents (service conductr-agent restart
) resolved the issue.
Reopening as a) the cli should be resilient to this and b) need to understand why this agents couldn't get their resource info through.
@edwardcallahan @huntc The following error occurs on ConductR agent several times when the bundle is scaled.
2017-06-22T22:11:19.651Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:19.651UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Bundle configuration used: None
2017-06-22T22:11:19.651Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:19.651UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Alternate configuration used: None and Some(/tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-1543755063621017421/7d0de0840454316214a1a0a3f8b9d0f50ba56b9ed10a62b1d3249a910a8e9bc0/sherpa-desk-prod/bundle.conf)
2017-06-22T22:11:19.652Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:19.652UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - About to start process [Vector(/usr/bin/env, bash, /tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-1543755063621017421/start)]
2017-06-22T22:11:19.655Z ip-10-100-6-23 INFO BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-19, akkaTimestamp=22:11:19.655UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] started [18390]
2017-06-22T22:11:19.655Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-19, akkaTimestamp=22:11:19.655UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been successfully started.
2017-06-22T22:11:20.074Z ip-10-100-6-23 WARN Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:20.074UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Invalid HTTP port: 10822 -Dhttp.address=10.100.6.23
2017-06-22T22:11:20.088Z ip-10-100-6-23 INFO Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:20.088UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Component sherpa-desk exited with 255
2017-06-22T22:11:20.088Z ip-10-100-6-23 INFO Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:20.088UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Stopping bundle
2017-06-22T22:11:20.134Z ip-10-100-6-23 INFO BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:20.134UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] exited with exit value [143]
2017-06-22T22:11:20.135Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:20.134UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been stopped with exit code Some(143)
So the bundle fails frequently because of Invalid HTTP port: 10822 -Dhttp.address=10.100.6.23
. Is this sherpa-desk
related?
Additionally, I see that 2 hours before the conduct agents
command has been performed, the RunExecutor
is restarted due to inconsistent bundle state between core and agent:
2017-06-22T22:11:38.776Z ip-10-100-6-23 INFO StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:38.776UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Restarting RunExecutor and stopping all running bundles due to an StopAllBundles event from akka.tcp://conductr@10.100.6.104:9004/user/reaper/agent-monitor/conductr-agent@10.100.6.23:2552.
2017-06-22T22:11:38.777Z ip-10-100-6-23 INFO RunExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=22:11:38.777UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0, sourceActorSystem=conductr-agent] - Shutting down with 1 bundle(s).
2017-06-22T22:11:38.777Z ip-10-100-6-23 DEBUG RunExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=22:11:38.777UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0, sourceActorSystem=conductr-agent] - Stopping bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 on node akka.tcp://conductr-agent@10.100.6.23:2552.
2017-06-22T22:11:39.150Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-5, akkaTimestamp=22:11:39.150UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Bundle configuration used: None
2017-06-22T22:11:39.150Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-5, akkaTimestamp=22:11:39.150UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Alternate configuration used: None and Some(/tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-4841436986406642223/7d0de0840454316214a1a0a3f8b9d0f50ba56b9ed10a62b1d3249a910a8e9bc0/sherpa-desk-prod/bundle.conf)
2017-06-22T22:11:39.151Z ip-10-100-6-23 DEBUG BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-5, akkaTimestamp=22:11:39.151UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - About to start process [Vector(/usr/bin/env, bash, /tmp/conductr-agent/10.100.6.23/2552/bundles/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5/execution-0-4841436986406642223/start)]
2017-06-22T22:11:39.153Z ip-10-100-6-23 INFO BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-22, akkaTimestamp=22:11:39.153UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] started [19962]
2017-06-22T22:11:39.153Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.153UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been successfully started.
2017-06-22T22:11:39.573Z ip-10-100-6-23 WARN Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.573UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Invalid HTTP port: 10822 -Dhttp.address=10.100.6.23
2017-06-22T22:11:39.586Z ip-10-100-6-23 DEBUG ElasticsearchCollector [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.586UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/ElasticsearchCollector, sourceActorSystem=conductr-agent] - Posted to Elasticsearch and receive a response failure: 503 Service Unavailable
2017-06-22T22:11:39.587Z ip-10-100-6-23 INFO Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-20, akkaTimestamp=22:11:39.587UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Component sherpa-desk exited with 255
2017-06-22T22:11:39.587Z ip-10-100-6-23 INFO Reaper [bundleId=0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceThread=conductr-agent-akka.actor.default-dispatcher-20, akkaTimestamp=22:11:39.587UTC, akkaSource=akka://conductr-agent/user/reaper, sourceActorSystem=conductr-agent] - Stopping bundle
2017-06-22T22:11:39.631Z ip-10-100-6-23 INFO BundleExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-4, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0/0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5, sourceActorSystem=conductr-agent] - Process for bundle [0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5] exited with exit value [143]
2017-06-22T22:11:39.631Z ip-10-100-6-23 DEBUG StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - Bundle 0c4eaae7f57dd510e355eb5ef4c502f9-7d0de0840454316214a1a0a3f8b9d0f5 has been stopped with exit code Some(143)
2017-06-22T22:11:39.631Z ip-10-100-6-23 INFO RunExecutor [sourceThread=conductr-agent-akka.actor.default-dispatcher-3, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/run-executor-0, sourceActorSystem=conductr-agent] - All bundles stopped.
2017-06-22T22:11:39.631Z ip-10-100-6-23 INFO StandaloneLauncherAgent [sourceThread=conductr-agent-akka.actor.default-dispatcher-2, akkaTimestamp=22:11:39.631UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent, sourceActorSystem=conductr-agent] - akka://conductr-agent/user/reaper/standalone-launcher-agent/run-executor-0 has been successfully terminated after stopping all the bundles.
After the restart of the RunExecutor
the following log message is logged every 15 seconds:
2017-06-22T22:11:53.774Z ip-10-100-6-23 DEBUG AgentInfo [sourceThread=conductr-agent-akka.actor.default-dispatcher-17, akkaTimestamp=22:11:53.774UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/agent-info, sourceActorSystem=conductr-agent] - Starting Resource poller
This log message indicates that the resource poller actor gets started. We don't log any message when the start was successful. Once a resource poller actor has been started, it will send every 15 seconds the resources to ConductR core. Also here, we do not log any message on the agent so I cannot tell based on the log output if something is wrong here. However, the fact that ConductR core has no resourceAvailable
information for this agent indicates that the resources are not successfully sent to the Core.
At the very beginning so before the above log messages I can see that agent connected successfully to the core and started sending resources:
2017-06-21T21:41:39.001Z ip-10-100-6-23 INFO CoreNodesWriter [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-nodes-writer, sourceActorSystem=conductr-agent, akkaTimestamp=21:41:38.995UTC] - Saving contact points [akka.tcp://conductr@10.100.2.17:9004, akka.tcp://conductr@10.100.6.104:9004]
2017-06-21T21:41:39.001Z ip-10-100-6-23 INFO CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:38.996UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - New contact point akka.tcp://conductr@10.100.6.104:9004/system/receptionist
2017-06-21T21:41:39.002Z ip-10-100-6-23 DEBUG CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:39.001UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - Terminating core services requestor for core-services-requestor-conductr@10.100.2.17:9004
2017-06-21T21:41:39.003Z ip-10-100-6-23 INFO CoreNodesWriter [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=21:41:39.003UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-nodes-writer, sourceActorSystem=conductr-agent] - Saving contact points [akka.tcp://conductr@10.100.2.17:9004, akka.tcp://conductr@10.100.6.104:9004, akka.tcp://conductr@10.100.4.28:9004]
2017-06-21T21:41:39.009Z ip-10-100-6-23 DEBUG CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:39.009UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - Starting core services requestor for conductr@10.100.6.104:9004
2017-06-21T21:41:39.011Z ip-10-100-6-23 DEBUG CoreServicesRequestor [sourceThread=conductr-agent-akka.actor.default-dispatcher-3, akkaTimestamp=21:41:39.011UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy/core-services-requestor-conductr@10.100.6.104:9004/supervised-core-services-requestor-conductr@10.100.6.104:9004, sourceActorSystem=conductr-agent] - Requesting info on core services from akka.tcp://conductr@10.100.6.104:9004/user/reaper/core-services
2017-06-21T21:41:39.013Z ip-10-100-6-23 INFO CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent, akkaTimestamp=21:41:39.012UTC] - New contact point akka.tcp://conductr@10.100.4.28:9004/system/receptionist
2017-06-21T21:41:39.027Z ip-10-100-6-23 INFO ClusterClient [sourceThread=conductr-agent-akka.actor.default-dispatcher-3, akkaTimestamp=21:41:39.026UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/cluster-client, sourceActorSystem=conductr-agent] - Connected to [akka.tcp://conductr@10.100.2.17:9004/system/receptionist]
2017-06-21T21:41:39.042Z ip-10-100-6-23 INFO CoreServicesProxy [sourceThread=conductr-agent-akka.actor.default-dispatcher-19, akkaTimestamp=21:41:39.041UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/core-services-proxy, sourceActorSystem=conductr-agent] - Received core services information CoreServicesInfo(http://10.100.6.104:9005/,http://10.100.6.104:9008/,http://10.100.6.104:9007/).
2017-06-21T21:41:39.050Z ip-10-100-6-23 DEBUG AgentInfo [sourceThread=conductr-agent-akka.actor.default-dispatcher-18, akkaTimestamp=21:41:39.044UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/standalone-launcher-agent/agent-info, sourceActorSystem=conductr-agent] - Starting Resource poller
2017-06-21T21:41:39.050Z ip-10-100-6-23 DEBUG ResourcePoller [sourceThread=conductr-agent-akka.actor.default-dispatcher-21, akkaTimestamp=21:41:39.045UTC, akkaSource=akka.tcp://conductr-agent@10.100.6.23:2552/user/reaper/resource-polling, sourceActorSystem=conductr-agent] - Start resource polling
Summary
Based on the ConductR agent log files I am not able to identify the issue why the resources are not updated on ConductR core. The ConductR agent was able to connect with a core initially and continues to send the resources. To further analyze the issue I'd need the ConductR core log files during the same time frame as well. @edwardcallahan Are you able to attach these log files as well?
@markusjura wrt https://github.com/typesafehub/conductr-cli/issues/504#issuecomment-311043448
The sherpa-desk bundle has only its name set from the config bundle
name = "prod-sherpa-desk"
The start command is from the sbt generated application bundle (it is in the .conductr/cache
folder on selkirk.lightbend.com)
start-command = ["sherpa-desk/bin/sherpa-desk", "-J-Xms134217728", "-J-Xmx134217728", "-Dhttp.port=$DESK_BIND_PORT -Dhttp.address=$DESK_BIND_IP"]
core_logs.tar.gz
All log files for the 22nd of the 2.17
core node observer for 6.23
as requested.
Note: When cluster was built it was over provisioned with agents. These private agent nodes are stopped before loading bundles to provide 'ready to go' spare agents. That might be the cause of the reconcile event you observed in the logs, @markusjura.
@edwardcallahan With stopped, do you mean a manual stop by a user?
I've tried to continue the investigation with the provided core log. Turns out that I need the log files of the day of all 3 core nodes. Tried to get them by ssh into selkirk and then on the respective nodes but the /var/log/conductr
folders only containes archives from 29th of June. Is there a change to get the logs of the remaing cores from 22nd?
@markusjura The nodes were stopped using AWS EC2 controls. Not terminated, but stopped, so that it can be resumed later, if needed, without incurring compute costs.
Regarding the logfile rotation, it seems that logratate is enabled and the logs were rotated away. Sorry. That said, the cluster is back in that state today. I'll assert that we can get it back into this state again with the bundles we have currently loaded.
@edwardcallahan If you can (and want) to reproduce the issue with the current bundles, please do and send me the logs of the failing agent and all core nodes. Otherwise, we can close the issue and open another one once we reproduced it.
Closing the issue because it cannot be reproduced.
version 1.2.4, conductr 2.1.1
After simultaneously loading multiple bundles, ie
None of the bundles could be started.
Although agents are present
The agents command cannot be used
Agents logs: agent6.23.tar.gz