Open hanynowsky opened 9 years ago
Memory leaks are expected as I did use Snimpy only in one-time scripts. I will look at that and fix them. Thanks for reporting!
I see that you noticed another bug, don't hesitate to open an issue for it.
Just before sys.exit()
, could you add:
import snimpy.mib
snimpy.mib._smi.smiExit()
This should remove any potential leaks from libsmi (but won't solve your problem).
What are your Celery tasks lifetime? Is it like enclosing your code in a for loop? Or is it doing something like reloading modules?
Thank you fore your reactiveness.
Well, as you said, the _smi.smiExit()
does not help. The nature of Python garbage collection makes this leak happen I guess And celery being an asynchronous task processor, does not help neither :)
The celery Task is crontabbed each minute or each five minutes. Here is the source:
from __future__ import absolute_import
from celery import shared_task
from celery.task import task
from django.core.management.base import BaseCommand, CommandError
from celery.decorators import periodic_task
from celery.schedules import crontab
import platform
import socket
import time
import sys
import os
import requests
import json
from django.conf import settings
from snimpy.manager import Manager as M
from snimpy.manager import load
from snimpy.snmp import SNMPNoSuchInstance
from snimpy.snmp import SNMPException
from snimpy.mib import SMIException
from socket import gaierror
from pyasn1.type.error import ValueConstraintError
import os
import traceback
from masac.feed.models import CiscoEquipment
from masac.feed.models import Routers
from masac.feed.models import Switches
import pickle
from copy import deepcopy
from copy import copy
import sqlite3 as lite
import inspect
from os.path import expanduser
LOG_DEBUG = settings.LOG_DEBUG
CARBON_SERVER = settings.CARBON_SERVER # Server where Cyanite is running
CARBON_PORT = settings.CARBON_PORT
SDSL_TIMEOUT = settings.SDSL_TIMEOUT
SWITCH_TIMEOUT = settings.SWITCH_TIMEOUT
ROUTER_TIMEOUT = settings.ROUTER_TIMEOUT
@periodic_task(run_every=(crontab(hour="*", minute="*/3", day_of_week="*")), queue='masac.switch', options={'queue': 'masac.switch', 'routing_key' : 'masac.switch'})
def send_switch_metrics():
try:
node = platform.node().replace('.', '-')
timestamp = int(time.time())
switch_list = Switches.objects.filter(marque='switch', banned=0)
DELAY = 1
#sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock = socket.socket()
sock.connect((CARBON_SERVER, CARBON_PORT))
message = ''
mib_path = os.path.join(settings.BASE_DIR, 'scripts', 'mib')
load(os.path.join(mib_path, 'SNMPv2-SMI.txt'))
load(os.path.join(mib_path, 'SNMPv2-TC.txt'))
load(os.path.join(mib_path, 'SNMPv2-CONF.txt'))
load(os.path.join(mib_path, 'SNMPv2-MIB.txt'))
load(os.path.join(mib_path, 'IANAifType-MIB.txt'))
load(os.path.join(mib_path, 'IF-MIB.txt'))
list_count = len(switch_list)
if list_count < 1:
print("Warning: List of switches is empty. Re-feed....")
# get_switches()
for device in switch_list:
hostname = str(device.login)
ip = str(device.ip)
comm = str(device.snmp_community)
if LOG_DEBUG == True:
print("Testing Switch: " + hostname + " " + ip + " " + comm)
try:
m = M(host=hostname, retries=0, community=comm, version=2, timeout=SWITCH_TIMEOUT, cache=False)
except gaierror as ge:
# print(traceback.format_exc())
if LOG_DEBUG == True:
print("Snimpy Manager Exception: gaierror")
metrics = {}
try:
for index in m.ifDescr:
metrics[m.ifDescr[index]] = {}
if "thernet" in str(m.ifDescr[index]) and "unrouted" not in str(m.ifDescr[index]) and "Vlan" not in str(m.ifDescr[index]) and "Null" not in str(m.ifDescr[index]):
metrics[m.ifDescr[index]]['ifHCOutOctets'] = m.ifHCOutOctets[index]
metrics[m.ifDescr[index]]['ifHCInOctets'] = m.ifHCInOctets[index]
# print(index, repr(m.ifDescr[index]), repr(m.ifHCOutOctets[index]), repr(m.ifHCInOctets[index]))
except SNMPNoSuchInstance as nie:
# print(traceback.format_exc())
print("SNMPNoSuchInstance for " + hostname + " " + ip + " " + comm)
except SNMPException as snmpe:
Switches.objects.filter(login=hostname, marque='switch').update(banned=1)
print("SNMPException for " + hostname + " " + ip + " " + comm + " | List Count: " + str(list_count))
# print(traceback.format_exc())
pass
except ValueConstraintError as e:
print("ValueConstraintError for " + hostname + " " + ip + " " + comm)
# print(traceback.format_exc())
lines= []
for if_descr in list(metrics.keys()):
for key in list(metrics[if_descr].keys()):
line = 'masac.switch.%s.%s.%s %d %d' % (hostname.replace(".","_"), if_descr.replace("/","_").replace(" ","_"), key, metrics[if_descr][key], timestamp)
lines.append(line)
message = '\n'.join(lines) + '\n'
if LOG_DEBUG == True:
print("MESSAGE for " + hostname + " : " + message)
sock.sendall(message.encode('UTF-8'))
sock.close()
time.sleep(DELAY)
except Exception as e:
print(traceback.format_exc())
Hello @vincentbernat Any news about this issue?
I think the valgrind output is misleading and the memory leak doesn't happen in the CFFI module. I have fixed a small memory leak in the CFFI module but I think this is not related to what you are saying. I still need to find some time to check the issue. I suppose that the issue could be tested by running the first snippet you provided into a loop and use some tool to find the memory leak.
I have pushed a new commit that should fix the problem. For the record, here is the code I have used to "trace" memory leaks.
#!/usr/bin/python3
from snimpy.manager import Manager as M
from snimpy.manager import load
import tracemalloc
import gc
def sdsl_test(hostname, community):
load('IF-MIB')
load('SNMPv2-MIB')
m = M(host=hostname, retries=0, community=community, version=2,
timeout=1, cache=0)
uptime = m.sysUpTime
print(uptime)
for ix in m.ifDescr:
desc = m.ifDescr[ix]
metric = m.ifHCInOctets[ix]
print(desc + " " + str(metric))
tracemalloc.start()
sdsl_test('127.0.0.1', 'public')
gc.collect()
snapshot1 = tracemalloc.take_snapshot()
for i in range(1000):
sdsl_test('127.0.0.1', 'public')
gc.collect()
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top 10 differences ]")
for stat in top_stats[:10]:
print(stat)
Ok great. I am going to test again. I guess it's 0.8.6 now? Did it land under pip ?
I didn't do a release, in case there is something else to fix. You can also just apply the patch manually. It is pretty simple: https://github.com/vincentbernat/snimpy/commit/cce05015ac2511b14592a276639445230f0c91ae
Sounds like the magic patch reduced the memory leak from 100% to 1%. That's great. Thanks for your responsiveness and great work.
Due to #35, I have reverted the fix. I have asked for help in the PySNMP mailing list and hopefully will come with a proper fix. Otherwise, I'll try to hack around.
I forgot about this bug. The current situation with 0.8.8 is that only one command generator per thread will be used for SNMPv2. This is not the case with SNMPv3 as the command generator cannot be reused between sessions in this case. So, you shouldn't get a memory leak with 0.8.8 either in your case.
That is excellent news. I am gonna put that to test in my next iteration. Thanks for the great job! Snimpy by the way is still in production in our company and it's rock solid!
Using Snimpy in some Celery Tasks, the server RAM consumption grows non-stop to the point kernel starts killing Celery workers process (oom). After running Valgrind against a simple Snimpy Script, Memory leak is detected:
If needed, I can provide code of the Celery Task using Snimpy. basically the task loop over a list of network routers, fetches metrics and send them to a server.