netdata / netdata

Architected for speed. Automated for easy. Monitoring and troubleshooting, transformed!
https://www.netdata.cloud
GNU General Public License v3.0
71.67k stars 5.92k forks source link

Exporting module: differing behavior vs 'backend' #9607

Closed mahlonsmith closed 3 years ago

mahlonsmith commented 4 years ago

I'm unsure if this is related to #9512, but throwing it here as a more generic problem report.

Bug report summary

The exporting module has different connection behavior than the now deprecated backend environment.

The 'backend' connects to the destination when it has data queued to send, and upon a successful connection, immediately sends it. In contrast, the exporting module connects to the destination, and sends data at the next "update every" interval. The end result being that destinations that close the TCP connection between samples can lose a sample in-between updates.

This feels like surprising behavior, if the exporting module is intended as a drop-in replacement for backend.

OS / Environment
> uname -a; uname -K
FreeBSD dev 12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC  amd64
1201000
Netdata version
> netdata -V
netdata v1.23.1
Component Name

exporting.

Steps To Reproduce

Minimal backend configuration:

[backend]
    hostname = dev
    enabled = yes
    data source = average
    type = json
    destination = localhost:2222
    prefix = netdata
    update every = 5
    buffer on failures = 10
    send charts matching = system.*

Minmal exporting configuration:

[exporting:global]
        enabled = yes

[json:netcat]
        hostname = dev
        enabled = yes
        data source = average
        destination = localhost:2223
        prefix = netdata
        update every = 5
        buffer on failures = 10
        send charts matching = system.*

Start a netcat listener on 2222, and one on 2223. Observe the difference.

> nc -vkl 2222
Connection from localhost 62669 received!
{"prefix":"netdata","hostname":"dev",.....    <--- this is instantaneous after connection
> nc -vkl 2223
Connection from localhost 62878 received!
[ 5 second pause ]
{"prefix":"netdata","hostname":"dev",.....    <--- this is after the 'update every' pause
Expected behavior

I have many, many netdata clients all funneling their samples to a single destination, which is a traditional forking server. To conserve resources (1000s of forked processes and database connections) on this destination, the TCP socket is closed between sends, letting netdata just merrily reconnect on its next interval. Clearly, this is a tradeoff at the expense of establishling TCP connections for each sample, but for this environment that's an easy trade to make.

My expectation was that behavior would be identical between backend and exporting. This appears to not be the case with both JSON and OpenTSDB, I didn't try with others.

Thanks all!

mahlonsmith commented 4 years ago

I neglected to mention the core of this, which was if you exit 'nc' manually after receiving a batch of samples from the exporting module (manually closing the connection), then start it back up, the sample that would have come in during the pause isn't pushed to the next update, and is lost. Essentially makes an 'update every = 5' into an 'update every = 10', if you continue to drop connection between sends.

amoss commented 4 years ago

@vlvkobal Could you take a look at this?

vlvkobal commented 4 years ago

The simple connector worker utilizes the same simplest flow of handling the main loop of a connector instance that was used in the backends subsystem. The new exporting engine is multithreaded though, so we need to protect data using mutexes. We can't use blocking calls while the data is locked, so they are called before the connector starts waiting for data.

It's not a problem for persistent connections, but in order to handle short-living connections correctly, we need to implement thread dispatching for every new connection like it was done for the mongodb exporting connector using a ring buffer. The existence of persistent connections should also be taken into account.