Closed BjoernT closed 9 years ago
Hi there @BjoernT,
These errors appear to be related to the monitoring agent itself -- I will need to pass this information onto the MaaS team for assistance.
I will keep you posted w/ what I find.
Thanks!
--Matt
Hi there @BjoernT,
There is a new cloud monitoring agent available (1.1.0-5). Can you please upgrade the affected nodes and let us know if the problem persists?
Kind Regards, Matt
@BjoernT Can you please provide feedback per @mattt416 comments?
@b3rnard0 We have no way of reproducing it, I'll check this with our next RPC 9.0.2 build and rpc-maas 9.0.2
Hi @BjoernT, I'm going to go ahead and close this issue. If the problem persists, please let us know and we'll reach out to the MaaS team.
Thanks!
--Matt
I have this issue on a fresh 10.1.2rc1 installation, and this is the version that gets dropped:
ii rackspace-monitoring-agent 1.1.0-41 amd64 Rackspace Cloud Monitoring Agent
apt-get update && apt-get upgrade rackspace-monitoring-agent
rackspace-monitoring-agent is already the newest version.
In my case, the agent just sits there, saying it should retry in xxxxms, and never does, even after minutes. The process logs nothing further, and does not die. The site shows the agent as not connected until I manually restart the service.
I have a few versions of the previous debs we;re manage to dig out from installations. I'm going to give the -5 version a try to see if it still works, or if this is some underlying system change (openssl updates).
Tried the "fixed in 1.1.0-5 version". Same problem. This leads me to believe it's either something external transient in nature (openssl), or something unexpected on certain pool members on the endpoints.
Mon Feb 16 19:44:46 2015 INF: (plugin=swift-recon.py, id=ch6YOZCfbZ, iid=idYnpN3g30) -> agent.plugin (details=args="quarantine",file="swift-recon.py",id="ch6YOZCfbZ",period=60) scheduled for 60s
Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 INF: SRV:_monitoringagent._tcp.ord1.prod.monitoring.api.rackspacecloud.com -> Retrying connection in 71685ms
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 INF: SRV:_monitoringagent._tcp.lon3.prod.monitoring.api.rackspacecloud.com -> Retrying connection in 68384ms
Yes I had similar issues to pin it down. It appeared only during a POC install and disappeared once I removed all agents and reinstalled everything fresh with working OMSA checks (from the 9.0 OMSA hot fix branch) from rpc-maas
Bjoern
On Feb 16, 2015, at 1:49 PM, Christopher H. Laco notifications@github.com<mailto:notifications@github.com> wrote:
Tried the "fixed in 1.1.0-5 version". Same problem. This leads me to believe it's either something external transient in nature (openssl), or something unexpected on certain pool members on the endpoints.
Mon Feb 16 19:44:46 2015 INF: (plugin=swift-recon.py, id=ch6YOZCfbZ, iid=idYnpN3g30) -> agent.plugin (details=args="quarantine",file="swift-recon.py",id="ch6YOZCfbZ",period=60) scheduled for 60s Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 ERR: Connection: 2001:4801:7902:1:0:a:4323:52:443 (2001:4801:7902:1:0:a:4323:52:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:33 2015 INF: SRV:_monitoringagent._tcp.ord1.prod.monitoring.api.rackspacecloud.comhttp://prod.monitoring.api.rackspacecloud.com -> Retrying connection in 71685ms Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 ERR: Connection: 2a00:1a48:7902:1:0:a:432:388:443 (2a00:1a48:7902:1:0:a:432:388:443) -> 140073835747200:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Mon Feb 16 19:45:43 2015 INF: SRV:_monitoringagent._tcp.lon3.prod.monitoring.api.rackspacecloud.comhttp://prod.monitoring.api.rackspacecloud.com -> Retrying connection in 68384ms
— Reply to this email directly or view it on GitHubhttps://github.com/rcbops/rpc-maas/issues/110#issuecomment-74562946.
Hi there, this is to let you know that our agent engineer (Ryan) has confirmed the memory issue you are experiencing is the same issue that he is looking at based on the errors here. He is actively working on it.
@wolfdancer Thank you!
@claco could I get access to a machine that is showing this issue?
@rphillips Sorry. I was out for a few days. I noticed a new agent version drop. Do you still need a stack to test against?
Try out the new version please. if you still see the issue, then please let me know asap.
@rphillips Just stood up a full stack with the new -53, and I still have the same issue:
Fri Mar 6 15:54:18 2015 ERR: Connection: 2001:4800:7902:1:0:a:4323:46:443 (2001:4800:7902:1:0:a:4323:46:443) -> 140404185462656:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:169:
Fri Mar 6 15:54:18 2015 INF: SRV:_monitoringagent._tcp.dfw1.prod.monitoring.api.rackspacecloud.com -> Retrying connection in 75480ms
If someone can reach out internally, I can get them login creds.
Later version of agent has better handling of mem-errors, and better logging of tracebacks.
Closing for now. Please re-open if we see hangs again.
RPCv9 checks are failing in repo version 9.0.1
Fri Oct 24 21:33:23 2014 INF: (plugin=horizon_check.py, id=chlmWtuPvL, iid=idHjiWAC9S) -> agent.plugin (details=args="172.29.237.162",file="horizon_check.py",id="chlmWtuPvL",period=60) scheduled for 60s Fri Oct 24 21:34:27 2014 ERR: Connection: nil (50.57.61.13:443) -> 139741911230336:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:159:
Fri Oct 24 21:34:27 2014 ERR: Connection: nil (50.57.61.13:443) -> 139741911230336:error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure:../base/deps/luvit/deps/openssl/openssl/crypto/buffer/buffer.c:159: