Open oleotiger opened 3 years ago
Hmm, good question. There shouldn't be a practical limit, it just reads until the stream stops. I currently can't replicate this locally, so some more information is appreciated.
My local tests, having no netdata matching filter, puts a metrics row at about 14k. A filtered one is 3k (or so).
(Got this from select pg_size_pretty( pg_column_size(metrics)::bigint ) from netdata;
)
Yours is 2 megs or so -- share your conf with me (mahlon@martini.nu if you'd like to do it privately) - I'm interested in how you have a 2MB sample as a routine send, wow. Some public questions:
In the meantime I'll work up a test scenario that just injects N bytes of data straight to the relay and see how it does.
Thanks.
Exporting conf of parent node:
[json:timescaledb_instance]
enabled = yes
destination = localhost:14866
remote write URL path = /write
data source = as collected
prefix = netdata
update every = 1
send hosts matching = *
buffer on failures = 100
# send charts matching = !cpu.cpu* !ipv6* !users.* nfs.rpc net.* net_drops.* net_packets.* !system.interrupts* system.* disk.* disk_space.* disk_ops.* mem.*
Stream conf of child node:
[stream]
enabled = yes
destination=xxx
timeout seconds = 60
default port = 19999
send charts matching = *
buffer size bytes = 1048576
reconnect delay seconds = 5
initial clock resync iterations = 60
All child nodes and parent node are collecting all mtrics at 1Hz.
I reproced it. Error log of netdata:
2021-01-27 10:30:15: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289916 bytes, wrote 2154360 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:15: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)
2021-01-27 10:30:23: netdata INFO : WEB_SERVER[static3] : POLLFD: LISTENER: client slot 2 (fd 75) from 150.1.68.34 port 59858 has not sent a complete request in 60 seconds - closing it.
2021-01-27 10:30:29: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289931 bytes, wrote 2088868 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:29: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)
2021-01-27 10:30:42: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289939 bytes, wrote 2154336 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:42: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)
2021-01-27 10:30:53: netdata INFO : STREAM_RECEIVER[150.1.68.37,[150.1.68.37]:57942] : RRDSET: chart name 'cpu.cpu68_interrupts' on host '150.1.68.37' already exists.
2021-01-27 10:30:56: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289941 bytes, wrote 2154318 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
2021-01-27 10:30:56: netdata ERROR : MAIN : Failed to connect to '::1', port '14866' (errno 111, Connection refused)
And there is log of netdata-timescale-relay:
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 150.1.68.32 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 150.1.68.32 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.
Client 127.0.0.1 closed socket.```
No error message of postgresql is found.
Here is my problem. When I export metrics from metrics at every 1s (as collected) from parent node and there is one child node, I get error from netdata like
2021-01-18 17:20:31: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 2092079 bytes, wrote 1766662 bytes. Will re-connect. (errno 11, Resource temporarily unavailable)
.When I add filter
send charts matching = !cpu.cpu* !ipv6* !users.* nfs.rpc net.* net_drops.* net_packets.* !system.interrupts* system.* disk.* disk_space.* disk_ops.* mem.*
, it works well.So I guess the error is raised by data size. There is too much data exporting to timescaledb through netdata-timescale-relay.
Am I right?
If yes, what is the max data flow that netdata-timescale-relay can handle?