poxet / Influx-Capacitor

Influx-capacitor collects metrics from windows machines using Performance Counters. Data is sent to influxDB to be viewable by grafana.
http://influx-capacitor.com
MIT License
44 stars 13 forks source link

Memory leak in Tharga.Influx-Capacitor.Service Processor #26

Closed nathanwebb closed 8 years ago

nathanwebb commented 8 years ago

Hi,

Firstly, this is a great tool. It didn't take very long to hook up, and is pretty much exactly what I was looking for. Installed it couple of days ago, with assistance with the database.xml file, and started to see stats immediately. Which is great, because I can see a memory leak in the Tharga.Influx-Capacitor.Service. Here are a couple of charts to show what I mean:

Firstly the working set - 1.4GB in 24 hours.

workingset

As a result, page fault have gone through the roof in the last few hours:

pagefaults

I'm going to restart the service, but if you need any logs, or whatever else, please let me know.

Cheers, Nathan

poxet commented 8 years ago

Cool! Monitor the tool with it self. :)

I will start to see if I can find something obvious. I have a couple of ideas.

poxet commented 8 years ago

When I first looked at this issue, I could not find any problem. Are you sure that it is not the queue adding up data? This happens if the InfluxDB cannot be reached. (I have added a configurable MaxQueueSize now as well)

The minor version 1.0.8.55 also have metadata counters so that the internal queue of metrics to be sent to the server can be monitored.

I just set up monitoring of memory usage on two different servers to see what is going on there. I will get back when I know more.

nathanwebb commented 8 years ago

Yes, I'd say you're right that it is the queue data, but not because it can't send - the charts in the first comment are from the data that has been sent to Influx. I've got a ton of data in Influxdb now, so it has been sent but perhaps not dereferenced.

I'm not a .NET developer, so don't have much clue about the code, but I've installed ANTS memory profiler and traced the leak to the InfluxDB.Net.Models.Point object. The Instance Retention graph is shown below. The class at the bottom (System.Collections.Generic.Dictionary) is easily the biggest memory consumer over time (compared between multiple snapshots). The graph indicates to me that it is in the SendBusiness class. As that class uses a timer it looks like .NET is holding onto the timer and the class is never getting disposed of. I'll see if I can find any more...

influxmemory

poxet commented 8 years ago

Great input. Will have a look at this shortly.

nathanwebb commented 8 years ago

The only other input I have at this stage is this thread on stackoverflow:

http://stackoverflow.com/questions/475763/is-it-necessary-to-dispose-system-timers-timer-if-you-use-one-in-your-applicatio

Again, I'm not a .NET developer, so excuse my ignorance if this is unrelated.

poxet commented 8 years ago

I found something that I think could solve the problem. It is out in version 1.0.15. If I am wrong I will re-open this case.

nathanwebb commented 8 years ago

Fantastic. Yes, it looks like you have resolved this. I've had it running for an hour, and memory usage has been stable.