What is the max data size which can be processed by netdata-timescale-relay?

oleotiger commented 3 years ago

Here is my problem. When I export metrics from metrics at every 1s (as collected) from parent node and there is one child node, I get error from netdata like 2021-01-18 17:20:31: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 2092079 bytes, wrote 1766662 bytes. Will re-connect. (errno 11, Resource temporarily unavailable).

When I add filter send charts matching = !cpu.cpu* !ipv6* !users.* nfs.rpc net.* net_drops.* net_packets.* !system.interrupts* system.* disk.* disk_space.* disk_ops.* mem.*, it works well.

So I guess the error is raised by data size. There is too much data exporting to timescaledb through netdata-timescale-relay.

Am I right?

If yes, what is the max data flow that netdata-timescale-relay can handle?

mahlonsmith commented 3 years ago

Hmm, good question. There shouldn't be a practical limit, it just reads until the stream stops. I currently can't replicate this locally, so some more information is appreciated.

My local tests, having no netdata matching filter, puts a metrics row at about 14k. A filtered one is 3k (or so). (Got this from select pg_size_pretty( pg_column_size(metrics)::bigint ) from netdata;)

Yours is 2 megs or so -- share your conf with me (mahlon@martini.nu if you'd like to do it privately) - I'm interested in how you have a 2MB sample as a routine send, wow. Some public questions:

Versions of everything. PostgreSQL, the relay, netdata.
Are you using the newer exporting.conf or the deprecated backends config in netdata?
Anything interesting or relevant in your PostgreSQL log files?
Are you using a pooler or any middleware?

In the meantime I'll work up a test scenario that just injects N bytes of data straight to the relay and see how it does.

Thanks.

oleotiger commented 3 years ago

Exporting conf of parent node:

[json:timescaledb_instance]
    enabled = yes
    destination = localhost:14866
    remote write URL path = /write
    data source = as collected
    prefix = netdata
    update every = 1
    send hosts matching = *
    buffer on failures = 100
#   send charts matching = !cpu.cpu* !ipv6* !users.* nfs.rpc net.* net_drops.* net_packets.* !system.interrupts* system.* disk.* disk_space.* disk_ops.* mem.*

Stream conf of child node:

[stream]
    enabled = yes
    destination=xxx
    timeout seconds = 60

    default port = 19999

    send charts matching = *

    buffer size bytes = 1048576

    reconnect delay seconds = 5

    initial clock resync iterations = 60

All child nodes and parent node are collecting all mtrics at 1Hz.

Versions of everything. PostgreSQL, the relay, netdata. PostgreSQL:12.5 Relay: I just git clone the master branch and compile it. Netdata: netdata v1.28.0-128-g852bbdf OS:CentOS Linux release 7.7.1908 (Core)
Are you using the newer exporting.conf or the deprecated backends config in netdata? I'm using exporting.conf
Anything interesting or relevant in your PostgreSQL log files? I will reproduce the error and post it later.
Are you using a pooler or any middleware? No, just child node netdata stream---------parent node netdata exporting ------- netdata-timescale-relay ----------- timescaledb

oleotiger commented 3 years ago

I reproced it. Error log of netdata:

2021-01-27 10:30:15: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289916 bytes, wrote 2154360 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:15: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)
2021-01-27 10:30:23: netdata INFO  : WEB_SERVER[static3] : POLLFD: LISTENER: client slot 2 (fd 75) from 150.1.68.34 port 59858 has not sent a complete request in 60 seconds - closing it.
2021-01-27 10:30:29: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289931 bytes, wrote 2088868 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:29: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)
2021-01-27 10:30:42: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289939 bytes, wrote 2154336 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:42: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)
2021-01-27 10:30:53: netdata INFO  : STREAM_RECEIVER[150.1.68.37,[150.1.68.37]:57942] : RRDSET: chart name 'cpu.cpu68_interrupts' on host '150.1.68.37' already exists.
2021-01-27 10:30:56: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289941 bytes, wrote 2154318 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:56: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)

And there is log of netdata-timescale-relay:



Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 150.1.68.32 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 150.1.68.32 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.```

No error message of postgresql is found.

mahlonsmith / netdata-timescale-relay

What is the max data size which can be processed by netdata-timescale-relay? #3