voiceip / oreka

OpenSource G711, G722, G729, Opus & Other Format VoIP SIP Recorder
GNU General Public License v3.0
143 stars 76 forks source link

docker version orkaudio maybe memory leak #106

Open wangduanduan opened 2 years ago

wangduanduan commented 2 years ago

docker voiceip/orkaudio:master

only 10 cps, every call talk 10 seconds, the memory useage keep grow always, and never reduce

wangduanduan commented 2 years ago

only 10 cps, the memory useage keep grow always, and never reduce

image

wangduanduan commented 2 years ago

image

kingster commented 2 years ago

Hi @wangduanduan

Your second graph shows the the graph flattens around 2GB. Any chance that you have kept it running for longer duration and observed the usage?

wangduanduan commented 2 years ago

@kingster thanks for reply.

the second graph show on 50 cps stress test. the graph flattens around 2GB, because i stop the stress test.

the fire graph show on 10 cps stress test

wangduanduan commented 2 years ago

this is my config.xml

<config>
        <!-- This is an example configuration file for the Oreka orkaudio capture service on Linux -->
        <!-- Copy this to config.xml and modify according to taste -->

        <AudioOutputPath>/var/log/orkaudio/audio</AudioOutputPath>
        <!-- <TapeFileNaming>[trackingid],[localparty],[remoteparty],[nativecallid]</TapeFileNaming> -->

        <!-- Uncomment the plugin you want to use: -->
        <!-- Use libvoip.so for SIP, Cisco Skinny and pure RTP -->
        <!-- Use libh323voip.so for Avaya, Nortel Unistim, H.323 and MGCP -->
        <!-- See in <VoIpPlugin> below for more precise protocol tuning -->
        <CapturePlugin>libvoip.so</CapturePlugin>
        <!--<CapturePlugin>libh323voip.so</CapturePlugin>-->
        <!--<CapturePlugin>liborksipua.so</CapturePlugin>-->

        <CapturePluginPath>/usr/lib</CapturePluginPath>
        <!--<PluginsDirectory>/oreka-src/orkaudio/plugins</PluginsDirectory>-->

        <!-- Audio file storage format: choose from: native, gsm, ulaw, alaw, pcmwav -->
        <StorageAudioFormat>pcmwav</StorageAudioFormat>
        <StereoRecording>true</StereoRecording>
        <TapeNumChannels>2</TapeNumChannels>
        <AudioFileBitRate>8000</AudioFileBitRate>

        <!-- If you want to keep native audio files as well as compressed, change this to "no" -->
        <DeleteNativeFile>yes</DeleteNativeFile>

        <TrackerHostname>192.168.40.186</TrackerHostname>
        <TrackerTcpPort>8080</TrackerTcpPort>

        <CapturePortFilters>LiveMonitoring</CapturePortFilters>
        <TapeProcessors>BatchProcessing, Reporting</TapeProcessors>

        <BatchProcessingEnhancePriority>true</BatchProcessingEnhancePriority>
        <NumBatchThreads>4</NumBatchThreads>

        <AudioFileOwner>tomcat</AudioFileOwner>
        <AudioFileGroup>tomcat</AudioFileGroup>
        <AudioFilePermissions>644</AudioFilePermissions>
        <!--<TapeDurationMinimumSec>3</TapeDurationMinimumSec>-->

        <!-- Uncomment the parameter below and fill in a comma-separated -->
        <!-- list of TCP addresses which you wish to open a connection to. -->
        <!-- For example 192.168.1.250:1721, 192.168.1.1:8091. A TCP -->
        <!-- connection shall be opened and a read-loop shall be entered -->
        <!-- into whereby any data read shall be discarded, and a record -->
        <!-- maintained of the amount of data which has been read. -->
        <!-- <SocketStreamerTargets></SocketStreamerTargets> -->

        <VoIpPlugin>
                <PcapSocketBufferSize>8388608</PcapSocketBufferSize>

                <!--queuemetrics integration, uncomment the following line-->
                <SipExtractFields>W_Call_ID</SipExtractFields>

                <!-- Use this for Nortel proprietary VoIP protocol -->
                <!--<UnistimDetect>yes</UnistimDetect>-->

                <!-- Turn both these on this for Avaya H.323 extensions -->
        <!--<AvayaDetect>yes</AvayaDetect>-->
        <!--<RtcpDetect>yes</RtcpDetect>-->

                <!-- Set the option below to "true" to enable IAX2 support -->
                <!-- the default is that IAX2 support is disabled -->
                <!--<Iax2Support>false</Iax2Support> -->

                <!-- Use this if you want to force capture from a given list of devices. -->
                <!-- All available devices are listed in orkaudio.log when the service is starting -->
                <Devices>enp89s0</Devices>

                <PcapFilter>host 192.168.2.221</PcapFilter>

        <!--<SipOverTcpSupport>yes</SipOverTcpSupport>-->
        <!--<SipReportFullAddress>yes</SipReportFullAddress>-->
                <!-- <SipRequestUriAsLocalParty>yes</SipRequestUriAsLocalParty> -->
                <!--<SipUse200OkMediaAddress>yes</SipUse200OkMediaAddress>-->

                <!-- Those two parameters are only needed for call direction detection (one or the other) -->
                <!--<SipDomains>company.com, 65.34.25.87</SipDomains>-->
                <!--<SipDirectionRefenceIpAddresses>65.34.98.56, 65.34.98.57</SipDirectionRefenceIpAddresses>-->

                <!-- Sangoma wanpipe RTP tap for TDM boards -->
                <!--<SangomaRxTcpPortStart>9000</SangomaRxTcpPortStart>-->
                <!--<SangomaTxTcpPortStart>11000</SangomaTxTcpPortStart>-->

                <!-- Mitel Communications Platform -->
                <!-- Turn on the parameter below to enable support for Mitel -->
                <!-- <MitelDetect>yes</MitelDetect> -->

                <!-- The parameter below sets the Mitel signalling port. The -->
                <!-- default is 3999 -->
                <!-- <MitelSignallingPort>3999</MitelSignallingPort> -->

                <!-- The parameter below sets the amount of time in seconds -->
                <!-- after which the cached Mitel metadata shall be discarded. -->
                <!-- The default is 60 seconds. -->
                <!-- <MitelMetadataTimeoutSec>60</MitelMetadataTimeoutSec> -->

                <!-- Turn on the parameter below to enable extension Mitel -->
                <!-- extension detection using ARP. Turning on this parameter -->
                <!-- automatically turns on MitelDetect -->
                <!-- <MitelArpExtensionDetect>yes</MitelArpExtensionDetect> -->

                <!-- Set MitelSmdrPort to the port where Mitel SMDR records -->
                <!-- may be accessed. The default is 1752. Note that you -->
                <!-- shall need to configure SocketStreamerTargets with the -->
                <!-- host and this port, in order for Oreka to access the -->
                <!-- SMDR records. See SocketStreamerTargets above for more -->
                <!-- information on how to configure it. -->
                <!-- <MitelSmdrPort>1752</MitelSmdrPort> -->
                <!-- End of Available Configurations for Mitel Communications Platform -->

        </VoIpPlugin>
</config>
wangduanduan commented 2 years ago

the orkaudio will be killed because the Out Of Memory limit of the docker service. so it can not be runing long time

wangduanduan commented 2 years ago

i think the speed of memory grow is not normal.

wangduanduan commented 2 years ago

i use sipp stress test

ename=2022/06/24/05/20220624_051741_SYLT.wav nativeCallId=64699-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYLV localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYLV.wav nativeCallId=64700-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYLX localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYLX.wav nativeCallId=64701-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYLZ localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYLZ.wav nativeCallId=64702-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMB localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMB.wav nativeCallId=64703-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMD localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMD.wav nativeCallId=64704-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMF localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMF.wav nativeCallId=64705-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-51 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMH localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051741_SYMH.wav nativeCallId=64706-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-52 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMJ localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051742_SYMJ.wav nativeCallId=64707-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-52 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYML localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051742_SYML.wav nativeCallId=64708-3265469@192.168.2.221 ondemand=false
date=2022-06-24_05-17-52 duration=10 direction=out localside=both audiokeepdirection=both capturePort=SYMN localParty=sipp remoteParty=service localEntryPoint= localIp=192.168.2.221 remoteIp=192.168.40.186 filename=2022/06/24/05/20220624_051742_SYMN.wav nativeCallId=64709-3265469@192.168.2.221 ondemand=false
wangduanduan commented 2 years ago

this is start log


OrkAudio version : service starting

2022-06-24 05:21:23,425  WARN config:278 - It is not recommended to have more batch threads than CPUs
2022-06-24 05:21:23,426  INFO root:109 - Loaded plugin: /usr/lib/libvoip.so
2022-06-24 05:21:23,428  INFO packet:1847 - Initializing VoIP plugin
2022-06-24 05:21:23,428  INFO packet:1554 - Available pcap devices:
2022-06-24 05:21:23,428  INFO packet:1561 - * veth010dbad -
2022-06-24 05:21:23,428  INFO packet:1561 - * enp89s0 -
2022-06-24 05:21:23,428  INFO packet:1353 - Setting pcap socket buffer size:8388608 bytes successful
2022-06-24 05:21:23,480  INFO packet:1377 - Activating pcaphandle:fc065140 successfully
2022-06-24 05:21:23,480  INFO packet:1392 - Setting setsockopt with bufsize:8388608 successfully
2022-06-24 05:21:23,480  INFO packet:1484 - Successfully opened device. pcap handle:fc065140 message:
2022-06-24 05:21:23,480  INFO packet:1561 - * docker0 -
2022-06-24 05:21:23,480  INFO packet:1561 - * vethce1df71 -
2022-06-24 05:21:23,480  INFO packet:1561 - * vethdb66095 -
2022-06-24 05:21:23,480  INFO packet:1561 - * vethe581717 -
2022-06-24 05:21:23,480  INFO packet:1561 - * lo -
2022-06-24 05:21:23,480  INFO packet:1561 - * any - Pseudo-device that captures on all interfaces
2022-06-24 05:21:23,480  INFO packet:1561 - * wlo1 -
2022-06-24 05:21:23,480  INFO packet:1561 - * bluetooth-monitor - Bluetooth Linux Monitor
2022-06-24 05:21:23,480  INFO packet:1561 - * nflog - Linux netfilter log (NFLOG) interface
2022-06-24 05:21:23,480  INFO packet:1561 - * nfqueue - Linux netfilter queue (NFQUEUE) interface
2022-06-24 05:21:23,480  INFO packet:1561 - * bluetooth0 - Bluetooth adapter number 0
2022-06-24 05:21:23,481  INFO packet:1744 - No localpartymap.csv supplied, either locally or at /etc/orkaudio/localpartymap.csv
2022-06-24 05:21:23,481  INFO packet:1805 - LoadSkinnyGlobalNumbersList: Could not open file:skinnyglobalnumbers.csv -- trying:/etc/orkaudio/skinnyglobalnumbers.csv now
2022-06-24 05:21:23,481  INFO packet:1811 - LoadPartyMaps: Could not open file:/etc/orkaudio/skinnyglobalnumbers.csv either -- giving up
2022-06-24 05:21:23,482  INFO root:170 - Loaded plugin: /usr/lib/orkaudio/plugins/librtpmixer.so
2022-06-24 05:21:23,482  INFO root:170 - Loaded plugin: /usr/lib/orkaudio/plugins/libsilkcodec.so
2022-06-24 05:21:23,482  INFO silk:243 - SILK codec filter initialized.
2022-06-24 05:21:23,483  INFO root:170 - Loaded plugin: /usr/lib/orkaudio/plugins/libg729codec.so
2022-06-24 05:21:23,483  INFO g729:149 - G729 codec filter starting.
2022-06-24 05:21:23,483  INFO g729:152 - G729 codec filter initialized.
2022-06-24 05:21:23,483  INFO taperegistry:62 - Registered processor: BatchProcessing
2022-06-24 05:21:23,483  INFO taperegistry:62 - Registered processor: CommandProcessing
2022-06-24 05:21:23,483  INFO taperegistry:62 - Registered processor: Reporting
2022-06-24 05:21:23,483  INFO taperegistry:62 - Registered processor: TapeFileNaming
2022-06-24 05:21:23,483  INFO taperegistry:62 - Registered processor: DirectionSelector
2022-06-24 05:21:23,483  INFO reporting:283 - [192.168.40.186:8080/orktrack] reporting thread started.
2022-06-24 05:21:23,483  INFO immediateProcessing:90 - thread starting - queue size:10000
2022-06-24 05:21:23,483  INFO batchProcessing:233 - thread Th0 starting - queue size:20000
2022-06-24 05:21:23,483  INFO batchProcessing:233 - thread Th1 starting - queue size:20000
2022-06-24 05:21:23,483  INFO batchProcessing:233 - thread Th2 starting - queue size:20000
2022-06-24 05:21:23,484  INFO batchProcessing:233 - thread Th3 starting - queue size:20000
2022-06-24 05:21:23,484  INFO tapeFileNamingLog:86 - Started
2022-06-24 05:21:23,484  INFO batchProcessing:106 - Command Processing thread Th0 starting - queue size:10000
2022-06-24 05:21:23,484  INFO httpserver:247 - Started HttpServer on port:59140
2022-06-24 05:21:23,484  INFO directionSelector:184 - thread Th0 starting - queue size:20000
2022-06-24 05:21:23,484  INFO tlsserver:318 - HTTPS server disabled
2022-06-24 05:21:23,484  INFO directionSelector:129 - LoadAreaCodesMaps: Could not open file:area-codes-recorded-side.csv -- trying:/etc/orkaudio/area-codes-recorded-side.csv now
2022-06-24 05:21:23,484  INFO eventstreamingserver:736 - Started EventstreamingServer on port:59150
2022-06-24 05:21:23,484  INFO directionSelector:135 - LoadAreaCodesMaps: Could not open file:/etc/orkaudio/area-codes-recorded-side.csv either -- giving up
2022-06-24 05:21:23,484  INFO packet:980 - Start Capturing: pcap handle:fc065140
2022-06-24 05:21:23,487  INFO reporting:329 - [192.168.40.186:8080/orktrack] init success:true comment:
2022-06-24 05:21:29,046  INFO packet:1744 - No localpartymap.csv supplied, either locally or at /etc/orkaudio/localpartymap.csv
2022-06-24 05:21:33,100  INFO pcapstats:1906 - enp89s0: handle:fc065140 received:181 received10s:181 dropped:0 dropped10s:0 ifdropped:0 ifdropped10s:0
2022-06-24 05:21:33,100  INFO pcapstats:831 - numPackets:155 maxPPS:42 minPPS:3
kingster commented 2 years ago

@wangduanduan

Can you share your sipp stress scripts, so that I can reproduce this issue?

I am running a slightly older version on production (v0.2.5) and haven't observed any memory leak, so it's possible that the memory leak got introduced in the recent merges from upstream.

wangduanduan commented 2 years ago

this is media.pcap file

media.zip uac agent

sipp uac agent

sipp -sf uac.xml 192.168.40.186:9944 -r 20 -mp 20000

this uac.xml

uac.zip

wangduanduan commented 2 years ago

sipp uas

sipp -sf uas.xml -i 192.168.40.186 -p 9944 -rtp_echo -mi 192.168.40.186 -mp 18690

uas.zip

kingster commented 2 years ago

Thanks, I will try to reproduce the issue in our setup.

wangduanduan commented 2 years ago

docket stats show orkaudio use 724Mib

NTAINER ID   NAME            CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
4f88849a9af3   capture         47.14%    772.4MiB / 1000MiB    77.24%    0B / 0B           1.95GB / 198GB   17

but when get into the contatiner and use top, the orkaudio only use 233MB

  32 root      20   0 1040824 233364  21040 S  41.7  0.7  74:21.79 orkaudio  

when out of the container, htop show orkaudio only use 227MB

image

in the container, the top show orkaudio memory usage will be stable after a while. but docker stats show orkaudio's MEM USAGE will grow always, when it hit the memory limit, docker will restart orkaudio.

wangduanduan commented 2 years ago

the metrics container_memory_working_set_bytes from cadvisor is keep grow

image

kingster commented 2 years ago

I think this is could be an explainable behaviour as you confirmed the memory usage of orkaudio is stable from the top output. Given that the recorder is continuously writing recorded files, the file cache would also be reported as memory used.

Have a look at this issue https://github.com/moby/moby/issues/40415 which exactly talks about this behaviour.

wangduanduan commented 2 years ago

i set a cron job in the orkaudio container

* * * * * echo 3 > /proc/sys/vm/drop_caches 

the memory usage will be release erery minute. but as long as i keep stress test it, the memory usage keep grow very slowly, about 1 mb grow every minite, but i think it is ok.

image

but what i realy don't understand why after the stress test, the memory usage keep a stable level (300MB) and never go down.

The behavior I expect is that after the stress test is over, the memory usage return to a lower level, not 300MB

image

wangduanduan commented 2 years ago

@kingster do you use production v0.2.5 is a docker container? or just a install version?

i also test docker orkaudio:0.2.5, the memory is keep grow very slowly too

wangduanduan commented 2 years ago

my stress test is 500 concurrent call, every second make 25 calls, every call duration is 20 seconds, media type is g711, what memory it should use?

kingster commented 2 years ago

The original docs mentions about 4 CPU cores and 4GB RAM per OrkAudio engine with up to 400 calls per engine

In the real world, oreka is much lot more CPU intensive unless you have unlimited disk and continue to record on pcmwav format. Transcoding to any compressed format is very resource-intensive.

In our production environment, we run the native version on bare metals, given the CPU utilisation it has (we transcode to compressed ogg format). Regarding memory utilisation, truly speaking, we never checked since we have enough memory available. I will see if I can figure out some memory growth/leaks.

my stress test is 500 concurrent call, every second make 25 calls, every call duration is 20 seconds, media type is g711, what memory it should use?

I would suggest you start with the original recommendation of 4GB, and then tweak based on your utilisation.

jmrbcu commented 2 years ago

In my experience (we use a very customized version of OrkAudio in production, more than 500 servers), these specs are way high. Out of all the threads used in OrkAudio, only the "batch" threads use more of the CPU/IO when they need to transcode from MCF to whatever format you need (stereo WAV) in our case. That said, we will also investigate the memory leak in our version. Usually, we crank the number of batch threads in sites where the traffic is high, more than 400 calls, the magic number is 4 threads in a server with 4 cores and 16 Gb of RAM. Right now, I just check this server and the RAM usage is below 1.3 Gb, the peaks in CPU usage are due to batch threads.

kingster commented 2 years ago

Some metrics from one of our production recorder (Post Opus Memory fixes) , our memory utilisation looks sort of constant, doesn't increase and has very slight decrease .

Call Rate: ~400calls/min, avg call duration ~1min, i.e concurrent ~400 calls, being transcoded to opus codec. System Info: 12 cores (24vcpu), 32GB memory

top - 17:05:09 up 771 days, 21:43.....
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
30463 root      20   0  6.294g 4.493g 4.009g S 911.6 14.3   4186:12 orkaudio

After about few hours..,

top - 21:36:41 up 772 days,  2:15...
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
30463 root      20   0  6.294g 4.483g 4.008g S 598.0 14.3   6606:22 orkaudio

Following are some of metrics of the bare-metal.

image

Memory Utilisation over 24hrs image

mohammadmahdi255 commented 5 months ago

i have this leakage as well it seem it's from AudioTape class which holds all the audio chunck i test the project using SSIP and valgrind tool and it seems we have leakage in AudioTape when the audio chunck are stored but not free