Closed PLColuccio closed 5 years ago
I looks like the plugin is trying to send a huge query to the server. Limit max_query_objects and/or max_query_metrics. For example:
max_query_objects = 100 max_query_metrics = 100
The next release will also limit queries to 100,000 metrics at a time, regardless of settings. This should prevent this from happening again.
As a side note, @phreak2599, I'd be interested in knowing a bit more about your configuration. Assuming you have the default 256 objects per query, 500,000 metrics sounds incredibly high. How many VMs/hosts are in that vCenter, if you don't mind sharing?
It was actually set to 64, since we are currently running 5.5, although we are in the process of upgrading to 6.5. I have since set it to 32 to see if that helps.
This plugin is currently running in one of our data centers against 4 vCenter hosts. Per vCenter: host count 29 vm count 548 host count 131 vm count 2343 host count 60 vm count 1733 host count 87 vm count 2168
Not sure which vCenter was causing the issue though.
Update: Seems to be working well with both the query settings set to 32.
Thanks for the help @prydin !
Spoke too soon. Looks like now I am getting:
2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-11-28T15:40:54Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-11-28T15:40:54Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-11-28T15:40:59Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:04Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:09Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:14Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:19Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:24Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:29Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:34Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:39Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:44Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:49Z D! [outputs.influxdb] buffer fullness: 0 / 1000000 metrics. 2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-11-28T15:41:54Z W! [agent] input "inputs.vsphere" did not complete within its interval
What's your collect_concurrency setting? Try to increase it to, say, 5.
Also, if you don't need instance-level (per CPU etc) metrics, you can turn that off per resource type, which should save you a lot of collection time.
Another thing you can try is to reduce the number of metrics collected to only those you need.
We're just at the tail end of a huge scale testing and performance tuning effort and should be providing an update soon that has some performance tweaks. In our lab, we're collecting metrics for 7000 VMs, including instance data, in about 6 seconds.
Not sure if I understand exactly what is happening, but it seems the initial discovery runs, then the plugin runs fine until the next discovery. When the next discovery runs, it doesn't complete, then the plugin doesn't seem to be sending any metrics, most likely due to the discovery failing.
Does that sound plausible?
If I raise the concurrency settings I think I will have to give more CPU to my vCenter DB servers. They have pegged out when I was playing with those in the past.
Try increasing the discovery interval to 30 minutes. The discovery logic is greatly improved in the version we're about to release. Should run 50-100 times faster!
I can post a binary if you feel like testing it out.
BTW, the concurrency settings for metric collection shouldn't have a huge impact on database servers, at least not for VM and host metrics, since they are scraped from ESXi memory.
I can try the latest repo. Let me see how difficult it is to compile.
Its not in the main repo, but in my fork. I can point you in the right direction in a little while. In transit this second.
Sent from my Verizon, Samsung Galaxy smartphone
-------- Original message -------- From: phreak2599 notifications@github.com Date: 11/28/18 11:38 AM (GMT-05:00) To: influxdata/telegraf telegraf@noreply.github.com Cc: Pontus Rydin prydin@vmware.com, Mention mention@noreply.github.com Subject: Re: [influxdata/telegraf] [Inputs.vsphere] Error in plugin: ServerFaultCode: XML document element count exceeds configured maximum 500000 (#5041)
I can try the latest repo. Let me see how difficult it is to compile.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Finfluxdata%2Ftelegraf%2Fissues%2F5041%23issuecomment-442514639&data=02%7C01%7Cprydin%40vmware.com%7C4a2c497cc173493bd8b008d6554fe3f0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636790198968235212&sdata=cIYsknheIVgba9ogMgrlvdgt3%2FUq4u85lpeJakxnN4Q%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE_y774rbHSPtaNRkkdkmkd11XCQuIFnks5uzrv0gaJpZM4Yz9FW&data=02%7C01%7Cprydin%40vmware.com%7C4a2c497cc173493bd8b008d6554fe3f0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636790198968235212&sdata=oswdhOFZwEtVODqBifMlYnn6FO07r%2F7IT%2FGx02HrzZg%3D&reserved=0.
I had the same issue with the new version. The discovery finishes, the initial metric collection seems to finish (but I don't think it does). I think the plugin is hanging, and not ever finishing the initial collection.
I am just getting:
2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 2 objects 2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 8 metrics 2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects 2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 8 metrics 2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects 2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 10 metrics 2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 0 metrics 2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects 2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 10 metrics 2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects 2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 10 metrics 2018-11-29T15:46:36Z D! [input.vsphere] Query for cluster returned metrics for 1 objects 2018-11-29T15:46:36Z D! [input.vsphere] CollectChunk for cluster returned 4 metrics 2018-11-29T15:46:40Z D! [outputs.influxdb] wrote batch of 12 metrics in 3.165227ms 2018-11-29T15:46:40Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:46:45Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:46:50Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:46:55Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:47:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:47:05Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:47:10Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:47:15Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:47:20Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-11-29T15:47:20Z W! [agent] input "inputs.vsphere" did not complete within its interval
and the last bit keeps repeating. Never starts collecting metrics again.
Are you collecting datastore metrics? Try disabling that.
datastore_metric_exclude = ["*"]
If that solves the problem, move the datastore collection to a separate instance of [inputs.vsphere] with an interval >= 300s.
Collection of datastore metrics can take a VERY long time due to the way vCenter manages that data. If it doesn't complete within the interval, you'll see these kinds of problems.
Also, let me point you to the very latest version that has some pretty radical performance improvements. Stand by!
Cool, that seemed to get things going on this current release. Do you know the timeframe for the new release?
The actual release timing is up to the influx team, but I can get you a snapshot from my branch today. Use at you own risk and all that, of course...
Here's a snapshot that's been tested in our lab for a few days without any issues. You're welcome to try it (at your own risk). I attached a compiled binary for Linux. Let me know if you need any other flavors.
https://github.com/prydin/telegraf/releases/tag/PR-SCALE-IMPROVEMENT-BETA1
I still have the same issue with the binary you provided. Even with an interval of 300s on the agent i still get no information after an hour of collecting metrics.
2018-12-03T13:50:00Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-12-03T13:52:00Z D! [outputs.influxdb] wrote batch of 34 metrics in 6.219342ms 2018-12-03T13:52:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-12-03T13:54:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-12-03T13:55:00Z W! [agent] input "inputs.vsphere" did not complete within its interval 2018-12-03T13:56:00Z D! [outputs.influxdb] wrote batch of 34 metrics in 5.141758ms 2018-12-03T13:56:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-12-03T13:58:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-12-03T14:00:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics. 2018-12-03T14:00:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
I get 204 http status on the influxdb side API.
Dec 03 14:42:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:42:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 35891433-f701-11e8-846a-005056bc0ddf 5270 Dec 03 14:46:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:46:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" c4963448-f701-11e8-846b-005056bc0ddf 7489 Dec 03 14:52:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:52:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 9b29d8c6-f702-11e8-8488-005056bc0ddf 5070 Dec 03 14:56:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:14:56:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 2a36e2c8-f703-11e8-849e-005056bc0ddf 4267 Dec 03 15:02:00 XXXXXXXXX influxd[32499]: [httpd] 10.12.168.11 - - [03/Dec/2018:15:02:00 +0100] "POST /write?db=iaaspriv HTTP/1.1" 204 0 "-" "Telegraf/unknown" 00ca9e50-f704-11e8-84a6-005056bc0ddf 4848
I only try to get few information on Vcenter that contain 7669 VM, here is my conf:
vm_metric_include = [
"cpu.usage.average",
"mem.usage.average",
"net.received.average",
"net.transmitted.average",
"virtualDisk.read.average",
"virtualDisk.write.average",
"virtualDisk.writeOIO.latest"
]
host_metric_include = [
"cpu.usage.average",
"disk.read.average",
"disk.write.average",
"disk.totalReadLatency.average",
"disk.totalWriteLatency.average",
"mem.usage.average",
"net.received.average",
"net.transmitted.average"
]
cluster_metric_exclude = ["*"]
datastore_metric_exclude = ["*"]
datacenter_metric_exclude = [ "*" ]
datacenter_metric_exclude = [ "*" ]
max_query_objects = 256
max_query_metrics = 256
collect_concurrency = 24
discover_concurrency = 24
object_discovery_interval = "600s"
timeout = "120s"
insecure_skip_verify = true
The "exclude" statements should read:
datstore_metric_exclude = [ "*" ]
Also, do you get any debug statements starting with [input.vsphere]? You should at least see some statements saying that it's attempting to collect.
datstore_metric_exclude = [ "*" ]
Sorry i've forget to display my conf in markdown.
And yes i get debug entry with [input.vsphere]: Latest log:
2018-12-03T13:31:41Z D! [input.vsphere] Discovering resources for datastore
After that none of this entry appears in my telegraf.log
I'd need to see all the [input.vsphere] log lines to troubleshoot this. It looks like discovering the datastores takes a really long time. How many datastores do you have?
Also, what is the output of telegraf -version
?
Telegraf version: Telegraf unknown (git: prydin-scale-improvement aaa67547
For security reason i can't give you the complete log, but the last interval didn't show up any errors with the key [input.vsphere]. The output only says
2018-12-03T13:31:40Z D! [input.vsphere] Skipped powered off VM: xxxxx <= this show up for more than a thousand time with another hostname
2018-12-03T13:31:41Z D! [input.vsphere] Found 11 metrics for foo.bar.io <= this show up for more than a thousand time with another hostname
2018-12-03T13:31:41Z D! [input.vsphere] Discovering resources for datastore
After this last line the process still running and send request to influxdb but without data (204 http/code).
@bashrc666 If possible, could you run telegraf with the -pprof-addr 0.0.0.0:6060
added to the end. Then, once the agent becomes unresponsive, you can get a complete goroutine dump using this command:
curl http://localhost:6060/debug/pprof/goroutine?debug=1
Copy and paste the output to this thread. The output doesn't contain any application data, so it should be safe to share. This will tell me exactly where the code locks up.
Context
I try to collect simple vm metrics on a vcenter that manage:
259 host 8129 VM
BUG After about 25min Telegraf still running but no data is inserted in influxdb.
GO DUMP :
8 @ 0x42e14b 0x43e12d 0x8893ce 0x88c330 0x45c551
# 0x8893cd github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x1cd /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:262
# 0x88c32f github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229
5 @ 0x42e14b 0x43e12d 0x17ec6dc 0x17ee29d 0x45c551
# 0x17ec6db github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut+0xeb /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56
# 0x17ee29c github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1+0xcc /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80
1 @ 0x40ae87 0x4431dc 0x737382 0x45c551
# 0x4431db os/signal.signal_recv+0x9b /usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139
# 0x737381 os/signal.loop+0x21 /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x69da8a 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x498fe8 internal/poll.(*FD).Read+0x178 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
# 0x5a143e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
100 12082 0 12082 0 0 2141k 0 --:--:-- --:--:-- --:-src/net/net.go:177lar/go/1.11/libexec/
# 0x69da89 net/http.(*connReader).backgroundRead+0x59 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x6ba7e5 0x559cd6 0x559e2f 0x6bb332 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x498fe8 internal/poll.(*FD).Read+0x178 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
# 0x5a143e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
# 0x5b5647 net.(*conn).Read+0x67 /usr/local/Cellar/go/1.11/libexe-:-- 2359k
c/src/net/net.go:177
# 0x6ba7e4 net/http.(*persistConn).Read+0x74 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
# 0x559cd5 bufio.(*Reader).fill+0x105 /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
# 0x559e2e bufio.(*Reader).Peek+0x3e /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
# 0x6bb331 net/http.(*persistConn).readLoop+0x1a1 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x49a590 0x5a1d82 0x5c026e 0x5be797 0x6a819f 0x6c8c7c 0x6a6fcf 0x6a6c86 0x6a7c74 0x1a5ac1f 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x49a58f internal/poll.(*FD).Accept+0x19f /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384
# 0x5a1d81 net.(*netFD).accept+0x41 /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238
# 0x5c026d net.(*TCPListener).accept+0x2d /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139
# 0x5be796 net.(*TCPListener).AcceptTCP+0x46 /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247
# 0x6a819e net/http.tcpKeepAliveListener.Accept+0x2e /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232
# 0x6a6fce net/http.(*Server).Serve+0x22e /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826
# 0x6a6c85 net/http.(*Server).ListenAndServe+0xb5 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764
# 0x6a7c73 net/http.ListenAndServe+0x73 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004
# 0x1a5ac1e main.main.func2+0x17e /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274
1 @ 0x42e14b 0x42e1f3 0x405a8e 0x4057bb 0x88a13c 0x88bde4 0x45c551
# 0x88a13b github.com/influxdata/telegraf/agent.(*Agent).runOutputs+0x2ab /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451
# 0x88bde3 github.com/influxdata/telegraf/agent.(*Agent).Run.func4+0x83 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17e6a7c 0x17edd08 0x45c551# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17e6a7b github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect+0x2ab /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:611
# 0x17edd07 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1+0x87 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:269
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec378 0x77f66d 0x88c40f 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17ec377 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather+0x167 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:282
# 0x77f66c github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather+0x6c /Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86
# 0x88c40e github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x3e /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec908 0x17e81de 0x17ed4fe 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17ec907 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Drain+0x87 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:117
# 0x17e81dd github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource+0x99d /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:758
# 0x17ed4fd github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1+0x9d /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:604
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ee4dd 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17ee4dc github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1+0xec /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:88
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8887c1 0x1a590a0 0x1a587a8 0x1a59f4a 0x42dd57 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x8887c0 github.com/influxdata/telegraf/agent.(*Agent).Run+0x470 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129
# 0x1a5909f main.runAgent+0x85f /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185
# 0x1a587a7 main.reloadLoop+0x247 /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101
# 0x1a59f49 main.main+0x4b9 /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381
# 0x42dd56 runtime.main+0x206 /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:201
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8891d8 0x88b9d4 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x8891d7 github.com/influxdata/telegraf/agent.(*Agent).runInputs+0x287 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232
# 0x88b9d3 github.com/influxdata/telegraf/agent.(*Agent).Run.func1+0xa3 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69
1 @ 0x42e14b 0x43e12d 0x17ecb5a 0x45c551
# 0x17ecb59 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1+0xd9 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:230
1 @ 0x42e14b 0x43e12d 0x19fa24d 0x45c551
# 0x19fa24c github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start+0xdc /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150
1 @ 0x42e14b 0x43e12d 0x1a5a8cf 0x45c551
# 0x1a5a8ce main.reloadLoop.func1+0xae /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88
1 @ 0x42e14b 0x43e12d 0x6bc8f3 0x45c551
# 0x6bc8f2 net/http.(*persistConn).writeLoop+0x112 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885
1 @ 0x42e14b 0x43e12d 0x8896b3 0x889326 0x88c330 0x45c551
# 0x8896b2 github.com/influxdata/telegraf/agent.(*Agent).gatherOnce+0x232 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287
# 0x889325 github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x125 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257
# 0x88c32f github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229
1 @ 0x42e14b 0x43e12d 0x88a3d2 0x88c8c3 0x45c551
# 0x88a3d1 github.com/influxdata/telegraf/agent.(*Agent).flush+0x1a1 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496
# 0x88c8c2 github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1+0xa2 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447
1 @ 0x42e14b 0x43e12d 0x88b82b 0x45c551
# 0x88b82a github.com/influxdata/telegraf/agent.(*Ticker).relayTime+0x12a /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46
1 @ 0x72d048 0x72ce50 0x7298b4 0x735cf0 0x7365c3 0x6a3f24 0x6a5bb7 0x6a6b5b 0x6a2f86 0x45c551
# 0x72d047 runtime/pprof.writeRuntimeProfile+0x97 /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:707
# 0x72ce4f runtime/pprof.writeGoroutine+0x9f /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:669
# 0x7298b3 runtime/pprof.(*Profile).WriteTo+0x3e3 /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328
# 0x735cef net/http/pprof.handler.ServeHTTP+0x20f /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245
# 0x7365c2 net/http/pprof.Index+0x722 /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268
# 0x6a3f23 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964
# 0x6a5bb6 net/http.(*ServeMux).ServeHTTP+0x126 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361
# 0x6a6b5a net/http.serverHandler.ServeHTTP+0xaa /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741
# 0x6a2f85 net/http.(*conn).serve+0x645 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847
Latest Log :
2018-12-04T09:56:00Z D! [outputs.influxdb] wrote batch of 136 metrics in 7.138475ms
2018-12-04T09:56:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-04T09:56:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:57:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:57:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:58:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-04T09:58:00Z D! [outputs.influxdb] wrote batch of 136 metrics in 10.431907ms
2018-12-04T09:58:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-04T09:58:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
THANK YOU!!!! This gives me a pretty good idea what's wrong!
@bashrc666 Thanks again for the detailed information. It was extremely helpful.
Here's a pre-release of what's on PR #5113
https://github.com/prydin/telegraf/releases/tag/PR-SCALE-IMPROVEMENT-RC1
Try it if you like. As always with a pre-release, you use it at your own risk.
Hello,
I still have the same issue with the same vcenter.
version :
Telegraf unknown (git: prydin-scale-improvement 646c5960
GO DUMP
goroutine profile: total 30
8 @ 0x42e14b 0x43e12d 0x88941e 0x88c380 0x45c551
# 0x88941d github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x1cd /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:262
# 0x88c37f github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229
2 @ 0x42e14b 0x43e12d 0x6bc943 0x45c551
# 0x6bc942 net/http.(*persistConn).writeLoop+0x112 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885
1 @ 0x40ae87 0x4431dc 0x7373d2 0x45c551
# 0x4431db os/signal.signal_recv+0x9b /usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139
# 0x7373d1 os/signal.loop+0x21 /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a148f 0x5b5698 0x603f29 0x60442d 0x6079b1 0x6ba835 0x559d26 0x559e7f 0x6bb382 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x498fe8 internal/poll.(*FD).Read+0x178 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
# 0x5a148e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
# 0x5b5697 net.(*conn).Read+0x67 /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
# 0x603f28 crypto/tls.(*block).readFromUntil+0x88 /usr/local/Cellar/go/1.11/libexec/src/crypto/tls/conn.go:492
# 0x60442c crypto/tls.(*Conn).readRecord+0xdc /usr/local/Cellar/go/1.11/libexec/src/crypto/tls/conn.go:593
# 0x6079b0 crypto/tls.(*Conn).Read+0xf0 /usr/local/Cellar/go/1.11/libexec/src/crypto/tls/conn.go:1145
# 0x6ba834 net/http.(*persistConn).Read+0x74 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
# 0x559d25 bufio.(*Reader).fill+0x105 /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
# 0x559e7e bufio.(*Reader).Peek+0x3e /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
# 0x6bb381 net/http.(*persistConn).readLoop+0x1a1 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a148f 0x5b5698 0x69dada 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x498fe8 internal/poll.(*FD).Read+0x178 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
# 0x5a148e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
# 0x5b5697 net.(*conn).Read+0x67 /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
# 0x69dad9 net/http.(*connReader).backgroundRead+0x59 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a148f 0x5b5698 0x6ba835 0x559d26 0x559e7f 0x6bb382 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x498fe8 internal/poll.(*FD).Read+0x178 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
# 0x5a148e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
# 0x5b5697 net.(*conn).Read+0x67 /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
# 0x6ba834 net/http.(*persistConn).Read+0x74 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
# 0x559d25 bufio.(*Reader).fill+0x105 /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
# 0x559e7e bufio.(*Reader).Peek+0x3e /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
# 0x6bb381 net/http.(*persistConn).readLoop+0x1a1 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x49a590 0x5a1dd2 0x5c02be 0x5be7e7 0x6a81ef 0x6c8ccc 0x6a701f 0x6a6cd6 0x6a7cc4 0x1a5a9ff 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x49a58f internal/poll.(*FD).Accept+0x19f /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384
# 0x5a1dd1 net.(*netFD).accept+0x41 /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238
# 0x5c02bd net.(*TCPListener).accept+0x2d /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139
# 0x5be7e6 net.(*TCPListener).AcceptTCP+0x46 /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247
# 0x6a81ee net/http.tcpKeepAliveListener.Accept+0x2e /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232
# 0x6a701e net/http.(*Server).Serve+0x22e /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826
# 0x6a6cd5 net/http.(*Server).ListenAndServe+0xb5 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764
# 0x6a7cc3 net/http.ListenAndServe+0x73 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004
# 0x1a5a9fe main.main.func2+0x17e /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274
1 @ 0x42e14b 0x42e1f3 0x404ead 0x404c85 0x17ec185 0x17e7689 0x17e7e05 0x17e8b60 0x17eda7e 0x45c551
# 0x17ec184 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*ThrottledExecutor).Run+0x54 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/throttled_exec.go:25
# 0x17e7688 github.com/influxdata/telegraf/plugins/inputs/vsphere.submitChunkJob+0x88 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:667
# 0x17e7e04 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).chunkify+0x734 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:732
# 0x17e8b5f github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource+0x7cf /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:789
# 0x17eda7d github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1+0x9d /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:651
1 @ 0x42e14b 0x42e1f3 0x405a8e 0x4057bb 0x88a18c 0x88be34 0x45c551
# 0x88a18b github.com/influxdata/telegraf/agent.(*Agent).runOutputs+0x2ab /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451
# 0x88be33 github.com/influxdata/telegraf/agent.(*Agent).Run.func4+0x83 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17e74dc 0x17ee225 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17e74db github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect+0x2ab /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:658
# 0x17ee224 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1+0x84 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:268
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ecd22 0x77f6bd 0x88c45f 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17ecd21 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather+0xe1 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:280
# 0x77f6bc github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather+0x6c /Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86
# 0x88c45e github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x3e /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x888811 0x1a58e80 0x1a58588 0x1a59d2a 0x42dd57 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x888810 github.com/influxdata/telegraf/agent.(*Agent).Run+0x470 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129
# 0x1a58e7f main.runAgent+0x85f /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185
# 0x1a58587 main.reloadLoop+0x247 /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101
# 0x1a59d29 main.main+0x4b9 /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381
# 0x42dd56 runtime.main+0x206 /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:201
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x889228 0x88ba24 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x889227 github.com/influxdata/telegraf/agent.(*Agent).runInputs+0x287 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232
# 0x88ba23 github.com/influxdata/telegraf/agent.(*Agent).Run.func1+0xa3 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69
1 @ 0x42e14b 0x43e12d 0x17ecffa 0x45c551
# 0x17ecff9 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1+0xd9 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:237
1 @ 0x42e14b 0x43e12d 0x19fa02d 0x45c551
# 0x19fa02c github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start+0xdc /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150
1 @ 0x42e14b 0x43e12d 0x1a5a6af 0x45c551
# 0x1a5a6ae main.reloadLoop.func1+0xae /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88
1 @ 0x42e14b 0x43e12d 0x6bd36a 0x6b3a01 0x69c165 0x6617db 0x6614fa 0x662b88 0x6628a5 0x172ca14 0x172d4a3 0x173f650 0x1738e18 0x17d419a 0x17e1785 0x17e93b7 0x17edc27 0x17edb23 0x17ee0ad 0x45c551
# 0x6bd369 net/http.(*persistConn).roundTrip+0x569 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:2101
# 0x6b3a00 net/http.(*Transport).roundTrip+0x9b0 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:465
# 0x69c164 net/http.(*Transport).RoundTrip+0x34 /usr/local/Cellar/go/1.11/libexec/src/net/http/roundtrip.go:17
# 0x6617da net/http.send+0x14a /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:250
# 0x6614f9 net/http.(*Client).send+0xf9 /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:174
# 0x662b87 net/http.(*Client).do+0x2a7 /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:641
# 0x6628a4 net/http.(*Client).Do+0x34 /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:509
# 0x172ca13 github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap.(*Client).do+0x113 /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap/client.go:442
# 0x172d4a2 github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap.(*Client).RoundTrip+0x882 /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/soap/client.go:524
# 0x173f64f github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25.(*Client).RoundTrip+0x7f /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/client.go:89
# 0x1738e17 github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/methods.QueryPerf+0xb7 /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/vim25/methods/methods.go:9899
# 0x17d4199 github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/performance.(*Manager).Query+0x1a9 /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/github.com/vmware/govmomi/performance/manager.go:276
# 0x17e1784 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Client).QueryMetrics+0x104 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/client.go:268
# 0x17e93b6 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectChunk+0x2c6 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:830
# 0x17edc26 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource.func1+0xe6 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:791
# 0x17edb22 github.com/influxdata/telegraf/plugins/inputs/vsphere.submitChunkJob.func1+0x42 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:668
# 0x17ee0ac github.com/influxdata/telegraf/plugins/inputs/vsphere.(*ThrottledExecutor).Run.func1+0x7c /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/throttled_exec.go:31
1 @ 0x42e14b 0x43e12d 0x6bf3df 0x45c551
# 0x6bf3de net/http.setRequestCancel.func3+0xce /usr/local/Cellar/go/1.11/libexec/src/net/http/client.go:321
1 @ 0x42e14b 0x43e12d 0x889703 0x889376 0x88c380 0x45c551
# 0x889702 github.com/influxdata/telegraf/agent.(*Agent).gatherOnce+0x232 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287
# 0x889375 github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x125 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257
# 0x88c37f github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229
1 @ 0x42e14b 0x43e12d 0x88a422 0x88c913 0x45c551
# 0x88a421 github.com/influxdata/telegraf/agent.(*Agent).flush+0x1a1 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496
# 0x88c912 github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1+0xa2 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447
1 @ 0x42e14b 0x43e12d 0x88b87b 0x45c551
# 0x88b87a github.com/influxdata/telegraf/agent.(*Ticker).relayTime+0x12a /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46
1 @ 0x72d098 0x72cea0 0x729904 0x735d40 0x736613 0x6a3f74 0x6a5c07 0x6a6bab 0x6a2fd6 0x45c551
# 0x72d097 runtime/pprof.writeRuntimeProfile+0x97 /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:707
# 0x72ce9f runtime/pprof.writeGoroutine+0x9f /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:669
# 0x729903 runtime/pprof.(*Profile).WriteTo+0x3e3 /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328
# 0x735d3f net/http/pprof.handler.ServeHTTP+0x20f /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245
# 0x736612 net/http/pprof.Index+0x722 /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268
# 0x6a3f73 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964
# 0x6a5c06 net/http.(*ServeMux).ServeHTTP+0x126 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361
# 0x6a6baa net/http.serverHandler.ServeHTTP+0xaa /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741
# 0x6a2fd5 net/http.(*conn).serve+0x645 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847
STDOUT
panic: runtime error: index out of range
goroutine 2061 [running]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectChunk(0xc000206900, 0x25051a0, 0xc00003e098, 0xc0016a4000, 0x13, 0x100, 0x21d7320, 0x9, 0xc000c3a090, 0x2512c40, ...)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:831 +0x17c2
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource.func1(0x25051a0, 0xc00003e098, 0x1c50cc0, 0xc000af96e0, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:734 +0xff
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1(0xc001376b80, 0xc000c675a0, 0x25051a0, 0xc00003e098, 0xc000155310)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80 +0x8e
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:72 +0xcd
@bashrc666 that output doesn't match the thread dump. The WorkerPool class doesn't exist anymore. Are you sure that's the right output?
As for the dump, it looks like it's stuck on a slow call to vCenter. What's your concurrency setting? Is the vCenter slow in general?
My conf
vcenters = [ 'http://foo.bar/sdk' ]
username = 'ADUSER'
password = "supersecurepassword"
vm_metric_include = [
"cpu.usage.average",
"mem.usage.average",
"net.received.average",
"net.transmitted.average",
"virtualDisk.read.average",
"virtualDisk.write.average",
"virtualDisk.writeOIO.latest"
]
host_metric_include = [
"cpu.usage.average",
"disk.read.average",
"disk.write.average",
"disk.totalReadLatency.average",
"disk.totalWriteLatency.average",
"mem.usage.average",
"net.received.average",
"net.transmitted.average"
]
cluster_metric_exclude = []
datastore_metric_exclude = []
datacenter_metric_exclude = [ "*" ]
collect_concurrency = 10
discover_concurrency = 4
object_discovery_interval = "3000s"
insecure_skip_verify = true
It only happened on this particular very big vcenter that contain 29 cluster and 259 host and 8129 VM and so many datastore.
Maybe i'have something to improve on this config ???
@prydin Thank's so much for the help
@bashrc666 It's probably the datastore collection that takes a long time. Break it out into a separate declaration of [[inputs.vsphere]] and set the interval for that instance to 300s. Also, you're collecting every metric on the datastores. You can save some collection time by specifying a smaller set.
@prydin i've decided to get ride of the datastore metric for the moment, et get back on it when i'm sure that the VMS and HOST collecting will work on that vcenter. but between 10 to 20min telegraf stop working.
CONFIG
[[inputs.vsphere]]
vcenters = [ 'https://foor.bar/sdk' ]
username = 'ADUSER'
password = "SUPERSTRONGPASSWORD"
vm_metric_include = []
host_metric_include = []
cluster_metric_exclude = ["*"]
datastore_metric_exclude = ["*"]
datacenter_metric_exclude = [ "*" ]
collect_concurrency = 10
discover_concurrency = 4
object_discovery_interval = "300s"
insecure_skip_verify = true
GO DUMP
goroutine profile: total 37
10 @ 0x42e14b 0x43e12d 0x17ec6dc 0x17ee29d 0x45c551
# 0x17ec6db github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut+0xeb /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56
# 0x17ee29c github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1+0xcc /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80
8 @ 0x42e14b 0x43e12d 0x8893ce 0x88c330 0x45c551
# 0x8893cd github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x1cd /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:262
# 0x88c32f github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229
1 @ 0x40ae87 0x4431dc 0x737382 0x45c551
# 0x4431db os/signal.signal_recv+0x9b /usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139
# 0x737381 os/signal.loop+0x21 /usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x69da8a 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x498fe8 internal/poll.(*FD).Read+0x178 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
# 0x5a143e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
# 0x5b5647 net.(*conn).Read+0x67 /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
# 0x69da89 net/http.(*connReader).backgroundRead+0x59 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x498fe9 0x5a143f 0x5b5648 0x6ba7e5 0x559cd6 0x559e2f 0x6bb332 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x498fe8 internal/poll.(*FD).Read+0x178 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169
# 0x5a143e net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202
# 0x5b5647 net.(*conn).Read+0x67 /usr/local/Cellar/go/1.11/libexec/src/net/net.go:177
# 0x6ba7e4 net/http.(*persistConn).Read+0x74 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497
# 0x559cd5 bufio.(*Reader).fill+0x105 /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100
# 0x559e2e bufio.(*Reader).Peek+0x3e /usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132
# 0x6bb331 net/http.(*persistConn).readLoop+0x1a1 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645
1 @ 0x42e14b 0x429489 0x428b36 0x49818a 0x49829d 0x49a590 0x5a1d82 0x5c026e 0x5be797 0x6a819f 0x6c8c7c 0x6a6fcf 0x6a6c86 0x6a7c74 0x1a5ac1f 0x45c551
# 0x428b35 internal/poll.runtime_pollWait+0x65 /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173
# 0x498189 internal/poll.(*pollDesc).wait+0x99 /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85
# 0x49829c internal/poll.(*pollDesc).waitRead+0x3c /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90
# 0x49a58f internal/poll.(*FD).Accept+0x19f /usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384
# 0x5a1d81 net.(*netFD).accept+0x41 /usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238
# 0x5c026d net.(*TCPListener).accept+0x2d /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139
# 0x5be796 net.(*TCPListener).AcceptTCP+0x46 /usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247
# 0x6a819e net/http.tcpKeepAliveListener.Accept+0x2e /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232
# 0x6a6fce net/http.(*Server).Serve+0x22e /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826
# 0x6a6c85 net/http.(*Server).ListenAndServe+0xb5 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764
# 0x6a7c73 net/http.ListenAndServe+0x73 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004
# 0x1a5ac1e main.main.func2+0x17e /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274
1 @ 0x42e14b 0x42e1f3 0x405a8e 0x4057bb 0x88a13c 0x88bde4 0x45c551
# 0x88a13b github.com/influxdata/telegraf/agent.(*Agent).runOutputs+0x2ab /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451
# 0x88bde3 github.com/influxdata/telegraf/agent.(*Agent).Run.func4+0x83 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17e6a7c 0x17edd08 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17e6a7b github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect+0x2ab /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:611
# 0x17edd07 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1+0x87 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:269
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec378 0x77f66d 0x88c40f 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17ec377 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather+0x167 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:282
# 0x77f66c github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather+0x6c /Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86
# 0x88c40e github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1+0x3e /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ec908 0x17e81de 0x17ed4fe 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17ec907 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Drain+0x87 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:117
# 0x17e81dd github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource+0x99d /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:758
# 0x17ed4fd github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1+0x9d /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:604
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x17ee4dd 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x17ee4dc github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1+0xec /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:88
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8887c1 0x1a590a0 0x1a587a8 0x1a59f4a 0x42dd57 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x8887c0 github.com/influxdata/telegraf/agent.(*Agent).Run+0x470 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129
# 0x1a5909f main.runAgent+0x85f /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185
# 0x1a587a7 main.reloadLoop+0x247 /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101
# 0x1a59f49 main.main+0x4b9 /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381
# 0x42dd56 runtime.main+0x206 /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:201
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ed69 0x474d54 0x8891d8 0x88b9d4 0x45c551
# 0x43ed68 sync.runtime_Semacquire+0x38 /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56
# 0x474d53 sync.(*WaitGroup).Wait+0x63 /usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130
# 0x8891d7 github.com/influxdata/telegraf/agent.(*Agent).runInputs+0x287 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232
# 0x88b9d3 github.com/influxdata/telegraf/agent.(*Agent).Run.func1+0xa3 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69
1 @ 0x42e14b 0x42e1f3 0x43f12c 0x43ee5d 0x4749e4 0x17e4960 0x17ecb93 0x45c551
# 0x43ee5c sync.runtime_SemacquireMutex+0x3c /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:71
# 0x4749e3 sync.(*RWMutex).Lock+0x73 /usr/local/Cellar/go/1.11/libexec/src/sync/rwmutex.go:98
# 0x17e495f github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).discover+0xd8f /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:452
# 0x17ecb92 github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1+0x112 /Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:232
1 @ 0x42e14b 0x43e12d 0x19fa24d 0x45c551
# 0x19fa24c github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start+0xdc /Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150
1 @ 0x42e14b 0x43e12d 0x1a5a8cf 0x45c551
# 0x1a5a8ce main.reloadLoop.func1+0xae /Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88
1 @ 0x42e14b 0x43e12d 0x6bc8f3 0x45c551
# 0x6bc8f2 net/http.(*persistConn).writeLoop+0x112 /usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885
1 @ 0x42e14b 0x43e12d 0x8896b3 0x889326 0x88c330 0x45c551
# 0x8896b2 github.com/influxdata/telegraf/agent.(*Agent).gatherOnce+0x232 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287
# 0x889325 github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval+0x125 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257
# 0x88c32f github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1+0xbf /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229
1 @ 0x42e14b 0x43e12d 0x88a3d2 0x88c8c3 0x45c551
# 0x88a3d1 github.com/influxdata/telegraf/agent.(*Agent).flush+0x1a1 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496
# 0x88c8c2 github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1+0xa2 /Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447
1 @ 0x42e14b 0x43e12d 0x88b82b 0x45c551
# 0x88b82a github.com/influxdata/telegraf/agent.(*Ticker).relayTime+0x12a /Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46
1 @ 0x72d048 0x72ce50 0x7298b4 0x735cf0 0x7365c3 0x6a3f24 0x6a5bb7 0x6a6b5b 0x6a2f86 0x45c551
# 0x72d047 runtime/pprof.writeRuntimeProfile+0x97 /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:707
# 0x72ce4f runtime/pprof.writeGoroutine+0x9f /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:669
# 0x7298b3 runtime/pprof.(*Profile).WriteTo+0x3e3 /usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328
# 0x735cef net/http/pprof.handler.ServeHTTP+0x20f /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245
# 0x7365c2 net/http/pprof.Index+0x722 /usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268
# 0x6a3f23 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964
# 0x6a5bb6 net/http.(*ServeMux).ServeHTTP+0x126 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361
# 0x6a6b5a net/http.serverHandler.ServeHTTP+0xaa /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741
# 0x6a2f85 net/http.(*conn).serve+0x645 /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847
Can you grab the full goroutine stack dump from here: http://localhost:6060/debug/pprof/goroutine?debug=2
TELEGRAF VERSION
~# /usr/bin/telegraf --version
Telegraf unknown (git: prydin-scale-improvement aaa67547)
CONTEXT
I try to collect simple vm metrics on a vcenter that manage:
259 host 8129 VM
Telegraf stop working between 20 or 30 min after it started.
2018-12-14T12:50:30Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:50:40Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:50:50Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2018-12-14T12:51:10Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:20Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:30Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:40Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:51:50Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:52:00Z D! [outputs.influxdb] buffer fullness: 0 / 100000 metrics.
2018-12-14T12:52:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
CONFIG
[global_tags]
[agent]
interval = "60s"
round_interval = true
metric_batch_size = 10000
metric_buffer_limit = 100000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = true
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = false
[[outputs.influxdb]]
urls = ["http://10.x.x.x:8086"]
database = "vcenter"
[[inputs.vsphere]]
vcenters = [ 'https://foo.bar/sdk' ]
username = 'ADUSER'
password = "SUPERSTRONGPASSWORD"
vm_metric_include = []
host_metric_include = []
cluster_metric_exclude = ["*"]
datastore_metric_exclude = ["*"]
datacenter_metric_exclude = [ "*" ]
collect_concurrency = 2
discover_concurrency = 2
object_discovery_interval = "600s"
insecure_skip_verify = true
GO DUMP LEVEL 2
goroutine 12632 [running]: [278/1877]
runtime/pprof.writeGoroutineStacks(0x24e9000, 0xc01931c0e0, 0x40be5f, 0xc022c4e240)
/usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:678 +0xa7
runtime/pprof.writeGoroutine(0x24e9000, 0xc01931c0e0, 0x2, 0xc0004e4700, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:667 +0x44
runtime/pprof.(*Profile).WriteTo(0x3ca45e0, 0x24e9000, 0xc01931c0e0, 0x2, 0xc01931c0e0, 0x21dec75)
/usr/local/Cellar/go/1.11/libexec/src/runtime/pprof/pprof.go:328 +0x3e4
net/http/pprof.handler.ServeHTTP(0xc0102a4011, 0x9, 0x2502020, 0xc01931c0e0, 0xc000128100)
/usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:245 +0x210
net/http/pprof.Index(0x2502020, 0xc01931c0e0, 0xc000128100)
/usr/local/Cellar/go/1.11/libexec/src/net/http/pprof/pprof.go:268 +0x723
net/http.HandlerFunc.ServeHTTP(0x22cf9c0, 0x2502020, 0xc01931c0e0, 0xc000128100)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0x3cd89a0, 0x2502020, 0xc01931c0e0, 0xc000128100)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc0000a6c30, 0x2502020, 0xc01931c0e0, 0xc000128100)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc00787c500, 0x2505160, 0xc02047a000)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2851 +0x2f5
goroutine 1 [semacquire, 101 minutes]:
sync.runtime_Semacquire(0xc00053aea8)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00053aea0)
/usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/agent.(*Agent).Run(0xc0002025f0, 0x2505160, 0xc000042d00, 0x1, 0x1)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:129 +0x471
main.runAgent(0x2505160, 0xc000042d00, 0x3cfde20, 0x0, 0x0, 0x3cfde20, 0x0, 0x0, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:185 +0x860
main.reloadLoop(0xc0002e0120, 0x3cfde20, 0x0, 0x0, 0x3cfde20, 0x0, 0x0, 0xc0007add58, 0x0, 0x0, ...)
/Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:101 +0x248
main.main()
/Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:381 +0x4ba
goroutine 17 [syscall, 101 minutes]:
os/signal.signal_recv(0x0)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
/usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/Cellar/go/1.11/libexec/src/os/signal/signal_unix.go:29 +0x41
goroutine 13 [select]:
github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.(*worker).start(0xc000133b80)
/Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:150 +0xdd
created by github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view.init.0
/Users/prydin/go/src/github.com/influxdata/telegraf/vendor/go.opencensus.io/stats/view/worker.go:29 +0x57
goroutine 14 [IO wait]:
internal/poll.runtime_pollWait(0x7f074bd93f00, 0x72, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00020c018, 0x72, 0xc0002a4200, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00020c018, 0xffffffffffffff00, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc00020c000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc00020c000, 0x50, 0x1fa58e0, 0xc0004bfd01)
/usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00013e018, 0xc0004bfd88, 0xc009ec73b0, 0xe25aac92949344a6)
/usr/local/Cellar/go/1.11/libexec/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).AcceptTCP(0xc00013e018, 0xc0004bfdb0, 0x48f726, 0x5c13a732)
/usr/local/Cellar/go/1.11/libexec/src/net/tcpsock.go:247 +0x47
net/http.tcpKeepAliveListener.Accept(0xc00013e018, 0xc0004bfe00, 0x18, 0xc0001ee600, 0x6a7095)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3232 +0x2f
net/http.(*Server).Serve(0xc0000a6c30, 0x2503060, 0xc00013e018, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2826 +0x22f
net/http.(*Server).ListenAndServe(0xc0000a6c30, 0xc0000a6c30, 0x41)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2764 +0xb6
net/http.ListenAndServe(0x7ffcb30abf4e, 0xc, 0x0, 0x0, 0x1, 0x21dc8c0)
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:3004 +0x74
main.main.func2()
/Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:274 +0x17f
created by main.main
/Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:264 +0xa1b
goroutine 37 [select, 101 minutes]:
main.reloadLoop.func1(0xc0002e02a0, 0xc0003042a0, 0xc00007b650, 0xc0002e0120)
/Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:88 +0xaf
created by main.reloadLoop
/Users/prydin/go/src/github.com/influxdata/telegraf/cmd/telegraf/telegraf.go:87 +0x1e2
goroutine 82 [IO wait, 82 minutes]:
internal/poll.runtime_pollWait(0x7f074bd93e30, 0x72, 0xc000414a88)
/usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00020c398, 0x72, 0xffffffffffffff00, 0x24eb300, 0x3bc37f0)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00020c398, 0xc00042b000, 0x1000, 0x1000)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc00020c380, 0xc00042b000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc00020c380, 0xc00042b000, 0x1000, 0x1000, 0x1, 0x0, 0xc0002b2ce0)
/usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00013e058, 0xc00042b000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/net/net.go:177 +0x68
net/http.(*persistConn).Read(0xc0000ba6c0, 0xc00042b000, 0x1000, 0x1000, 0xc0000ba480, 0xc0000ba6c0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1497 +0x75
bufio.(*Reader).fill(0xc000134420)
/usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:100 +0x106
bufio.(*Reader).Peek(0xc000134420, 0x1, 0x2, 0x0, 0x0, 0xc0002e1ec0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/bufio/bufio.go:132 +0x3f
net/http.(*persistConn).readLoop(0xc0000ba6c0)
/usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1645 +0x1a2
created by net/http.(*Transport).dialConn
/usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1338 +0x941
goroutine 83 [select, 82 minutes]:
net/http.(*persistConn).writeLoop(0xc0000ba6c0) [169/1877]
/usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1885 +0x113
created by net/http.(*Transport).dialConn
/usr/local/Cellar/go/1.11/libexec/src/net/http/transport.go:1339 +0x966
goroutine 20 [semacquire, 101 minutes]:
sync.runtime_Semacquire(0xc00053b938)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00053b930)
/usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/agent.(*Agent).runInputs(0xc0002025f0, 0x2505160, 0xc000042d00, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:232 +0x288
github.com/influxdata/telegraf/agent.(*Agent).Run.func1(0xc00053aea0, 0xc0002025f0, 0x2505160, 0xc000042d00, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:69 +0xa4
created by github.com/influxdata/telegraf/agent.(*Agent).Run
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:66 +0x3bb
goroutine 21 [chan receive, 82 minutes]:
github.com/influxdata/telegraf/agent.(*Agent).runOutputs(0xc0002025f0, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0, 0x4500000000, 0x201)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:451 +0x2ac
github.com/influxdata/telegraf/agent.(*Agent).Run.func4(0xc00053aea0, 0xc0002025f0, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xc00003a5a0)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:123 +0x84
created by github.com/influxdata/telegraf/agent.(*Agent).Run
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:120 +0x460
goroutine 22 [select]:
github.com/influxdata/telegraf/agent.(*Agent).flush(0xc0002025f0, 0x2505160, 0xc0002a4900, 0xc000483290, 0x2540be400, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:496 +0x1a2
github.com/influxdata/telegraf/agent.(*Agent).runOutputs.func1(0xc00053af30, 0xc0002025f0, 0x2505160, 0xc0002a4900, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0x2540be400, 0x0, 0xc000483290)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:447 +0xa3
created by github.com/influxdata/telegraf/agent.(*Agent).runOutputs
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:436 +0x1b9
goroutine 27 [select]:
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce(0xc0002025f0, 0x2512c40, 0xc0002ebd80, 0xc000043240, 0xdf8475800, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:287 +0x233
github.com/influxdata/telegraf/agent.(*Agent).gatherOnInterval(0xc0002025f0, 0x2505160, 0xc000042d00, 0x2512c40, 0xc0002ebd80, 0xc000043240, 0xdf8475800, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:257 +0x126
github.com/influxdata/telegraf/agent.(*Agent).runInputs.func1(0xc00053b930, 0xc0002025f0, 0x2505160, 0xc000042d00, 0xbefd01b4aa3b9c9f, 0x22803fa, 0x3cd91c0, 0xdf8475800, 0x2512c40, 0xc0002ebd80, ...)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:229 +0xc0
created by github.com/influxdata/telegraf/agent.(*Agent).runInputs
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:218 +0x171
goroutine 39 [select]:
github.com/influxdata/telegraf/agent.(*Ticker).relayTime(0xc000bec000, 0x2505160, 0xc000be8000)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:46 +0x12b
created by github.com/influxdata/telegraf/agent.NewTicker
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/tick.go:33 +0x135
goroutine 6098 [select, 82 minutes]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut(0xc02253d640, 0x25051a0, 0xc00003c048, 0x0, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56 +0xec
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1(0xc002a51520, 0xc02253d640, 0x25051a0, 0xc00003c048, 0xc001312f00)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80 +0xcd
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:72 +0xcd [114/1877]
goroutine 6081 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc02253d648)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc02253d640)
/usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Drain(0xc02253d640, 0x25051a0, 0xc00003c048, 0xc02253dcc0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:117 +0x88
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource(0xc000146380, 0x25051a0, 0xc00003c048, 0x21ccf4e, 0x2, 0x2512c40, 0xc0002ebd80, 0x36cc0d000, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:758 +0x99e
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect.func1(0xc01b1f5170, 0xc000146380, 0x25051a0, 0xc00003c048, 0x2512c40, 0xc0002ebd80, 0x21ccf4e, 0x2)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:604 +0x9e
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:602 +0x299
goroutine 6079 [select, 82 minutes]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).push(0xc02253d640, 0x25051a0, 0xc00003c048, 0x1c50cc0, 0xc00ef1ca20, 0xc00ef1ca20)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:47 +0xec
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).push-fm(0x25051a0, 0xc00003c048, 0x1c50cc0, 0xc00ef1ca20, 0x7)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:100 +0x52
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).chunker(0xc000146380, 0x25051a0, 0xc00003c048, 0xc0159054d0, 0xc00f5a79e0, 0x81d260, 0xed3a58ac0, 0x0, 0x0, 0xed3a58a84, ...)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:677 +0x708
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).collectResource.func2(0x25051a0, 0xc00003c048, 0xc0159054d0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:752 +0x8b
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Fill.func1(0xc02253d640, 0xc001312f50, 0x25051a0, 0xc00003c048)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:100 +0xa0
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Fill
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:98 +0x76
goroutine 1177 [semacquire, 81 minutes]:
sync.runtime_SemacquireMutex(0xc0001463d8, 0xc00ca49100)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).Lock(0xc0001463d0)
/usr/local/Cellar/go/1.11/libexec/src/sync/rwmutex.go:98 +0x74
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).discover(0xc000146380, 0x2505160, 0xc0002a4740, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:452 +0xd90
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery.func1(0xc000146380, 0x2505160, 0xc0002a4740)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:232 +0x113
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).startDiscovery
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:228 +0x81
goroutine 12633 [IO wait]:
internal/poll.runtime_pollWait(0x7f074bd93bc0, 0x72, 0xc000c2de58)
/usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc0020de098, 0x72, 0xffffffffffffff00, 0x24eb300, 0x3bc37f0)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc0020de098, 0xc022c4e000, 0x1, 0x1)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc0020de080, 0xc022c4e0d1, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc0020de080, 0xc022c4e0d1, 0x1, 0x1, 0x0, 0x24ad6, 0x259a9)
/usr/local/Cellar/go/1.11/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000202aa0, 0xc022c4e0d1, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11/libexec/src/net/net.go:177 +0x68
net/http.(*connReader).backgroundRead(0xc022c4e0c0) [59/1877]
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:676 +0x5a
created by net/http.(*connReader).startBackgroundRead
/usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:672 +0xd2
goroutine 6076 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc01b1f5178)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc01b1f5170)
/usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect(0xc000146380, 0x25051a0, 0xc00003c048, 0x2512c40, 0xc0002ebd80, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:611 +0x2ac
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1(0xc00dc1bf90, 0x2512c40, 0xc0002ebd80, 0xc009476cc0, 0xc000146380)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:269 +0x88
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:267 +0x13e
goroutine 6078 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc002a51528)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc002a51520)
/usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1(0xc02253d640, 0x2, 0x25051a0, 0xc00003c048, 0xc001312f00)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:88 +0xed
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:67 +0x84
goroutine 6075 [semacquire, 83 minutes]:
sync.runtime_Semacquire(0xc00dc1bf98)
/usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00dc1bf90)
/usr/local/Cellar/go/1.11/libexec/src/sync/waitgroup.go:130 +0x64
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather(0xc0002aefc0, 0x2512c40, 0xc0002ebd80, 0x2710, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:282 +0x168
github.com/influxdata/telegraf/internal/models.(*RunningInput).Gather(0xc000043240, 0x2512c40, 0xc0002ebd80, 0xc001637fc0, 0x88ca67)
/Users/prydin/go/src/github.com/influxdata/telegraf/internal/models/running_input.go:86 +0x6d
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1(0xc0001786c0, 0xc000043240, 0x2512c40, 0xc0002ebd80)
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:283 +0x3f
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce
/Users/prydin/go/src/github.com/influxdata/telegraf/agent/agent.go:282 +0xdc
goroutine 6097 [select, 82 minutes]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).pushOut(0xc02253d640, 0x25051a0, 0xc00003c048, 0x0, 0x0, 0x0)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:56 +0xec
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1.1(0xc002a51520, 0xc02253d640, 0x25051a0, 0xc00003c048, 0xc001312f00)
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:80 +0xcd
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*WorkerPool).Run.func1
/Users/prydin/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/workerpool.go:72 +0xcd
NOTE
I just figured that, when i run telegraf as a systemd unit it fail like this case. but when i run it into a linux jobs with the same parameters of the systemd unit it work properly for more than an 2hours. I really dont get it. right now i'm trying to setup a proper InfluxDB Enterprise Cluster to check if this collecting failure it's not because of a standalone Influxdb.
Update
My bad, The plugin working fine in release Telegraf unknown (git: prydin-scale-improvement 646c5960). I just forget to tell grafana to connect each point of metric in an interval superior of 1min. I appologize for my huge misstake..
I have increase my interval at 120s and it's working like a charm with all my Vcenter
I believe this is working, and now available, in 1.10.0
I am receiving this when the plugin receives metrics from the vCenter servers.
Any idea on what is wrong / how to fix?
2018-11-26T22:24:47Z E! [inputs.vsphere]: Error in plugin: ServerFaultCode: XML document element count exceeds configured maximum 500000
while parsing serialized DataObject of type vim.PerformanceManager.MetricId at line 2, column 19637665
while parsing property "metricId" of static type ArrayOfPerfMetricId
while parsing serialized DataObject of type vim.PerformanceManager.QuerySpec at line 2, column 19598059
while parsing call information for method QueryPerf at line 2, column 66
while parsing SOAP body at line 2, column 60
while parsing SOAP envelope at line 2, column 0
while parsing HTTP request for method queryStats on object of type vim.PerformanceManager at line 1, column 0