Closed kostasb closed 5 years ago
@nathanielc Any thoughts?
A few thoughts....
Related to issue #5559
When implementing the backoff policy, should InfluxDB give up after a certain number of tries? The Wikipedia article recommends giving up after 16 attempts.
@jsternberg Are you planning on adding this? We should probably discuss more first. As Its been a while since this was requested and subscriptions have change a bit since.
In general the plan around subscriptions was to not implement retries so that the you don't overwhelm the database itself. If you want smart retry logic use the influxdb-relay as its already built in.
I was going through things labeled support and I was going to attempt to implement them. If this is no longer needed, we should close it. I have no opinion on whether or not this is important or not at the moment.
@kostasb Is this still an issue? Since now we have the influxdb-relay, which can handle retries, seems like we recommend using the relay instead of subscriptions if retries are needed.
Also to be clear subscriptions currently do not have any retry logic, so there is no stampeding or other side effects of attempting retries.
@nathanielc HArelay can be used as a queuing mechanism for any scenario that needs buffering/retry logic in Line Protocol. But from support's perspective we are reluctant to suggest the use of a tool that is not being supported by the core team, for anything other than the original purpose of custom high-availability setups. E.g. we would not be able to suggest the use of InfluxRelay to any paying customers.
Implementing this logic is more of a design decision. I am not sure whether buffering/queuing is what we want in this case, rather than just a way to back off from forwarding points to Kapacitor while the latter fails to ingest them.
I ran into this issue as well when trying to move from http to https kapacitor. Influx held onto the http subscription.
So show subscriptions
returned
> show subscriptions
name: _internal
retention_policy name mode destinations
---------------- ---- ---- ------------
monitor kapacitor-d561ce59-c5d2-4b03-a5a4-3b21a6f6e073 ANY [http://localhost:9092]
monitor kapacitor-f8b76969-9431-4929-99c8-70cf9314812a ANY [https://localhost:9092]
i was seeing this in the influx logs
2018-03-28T16:52:40.009234Z info Post http://localhost:9092/write?consistency=&db=_internal&precision=ns&rp=monitor: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16" {"log_id": "077O85al000", "service": "subscriber"}
and this in the kapacitor logs
ts=2018-03-28T10:06:30.007-07:00 lvl=info msg="http request" service=http host=::1 username=- start=2018-03-28T10:06:30.005690549-07:00 method=POST uri=/write?consistency=&db=_internal&precision=ns&rp=monitor protocol=HTTP/1.1 status=204 referer=- user-agent=InfluxDBClient request-id=5bc1cd54-32aa-11e8-8021-000000000000 duration=1.820401ms
ts=2018-03-28T10:06:30.007-07:00 lvl=error msg="2018/03/28 10:06:30 http: TLS handshake error from [::1]:55794: tls: oversized record received with length 21536\n" service=http service=httpd_server_errors
I had to manually remove the subscription using
drop subscription "kapacitor-d561ce59-c5d2-4b03-a5a4-3b21a6f6e073" on _internal.monitor
somehow between influxdb and kapacitor, the old http subscription should have been automatically removed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.
Consider implementing a backoff policy for when data points cannot be delivered to InfluxDB subscriptions. A high number of connection attempts may overwhelm the system if it keeps trying to send data when the subscriber is not listening (incidents have been reported).
Also, might be meaningful to reformat the logging output.
[subscriber] 2016/01/27 08:00:00 write udp 127.0.0.1:59505: connection refused
Two considerations: -udp is connectionless so it might be better to indicate failures as "timeout" or "udp write failed" -it would help in troubleshooting to log the subscription's name in addition to [subscriber]