influxdata / influxdb-client-python

InfluxDB 2.0 python client
https://influxdb-client.readthedocs.io/en/stable/
MIT License
722 stars 187 forks source link

Write to InfluxDB cloud fails with ApiException: (503) Reason: Service Unavailable; upstream connect error or disconnect/reset before headers. reset reason: connection failure #326

Closed gefaila closed 3 years ago

gefaila commented 3 years ago

Steps to reproduce: Using the Python client make repeated calls to the write API. Do this simultaneously from 2 or 3 different clients (not sure if this is relevant but this is the fail condition) Write rates are a call to the API every second or so I'm using the SYNCHRONOUS calls

influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries) write_api = influx_client.write_api(write_options=SYNCHRONOUS) (repeatedly) influx_returns = write_api.write(InfluxBucket, my_org, Influx_lines,'ms')

Expected behavior: Until 17th Aug we were seeing the expected behaviour. All the writes succeeded.
Data was written reliably

Actual behavior: From 17th August we started getting

ApiException: (503) Reason: Service Unavailable; upstream connect error or disconnect/reset before headers. reset reason: connection failure

The errors definitely started on 17th Aug and I know we didn't change anything because everyone was on holiday! I have AWS logs which show when this started image

You can see the discussion here that it's affecting multiple users now

Specifications:

dabeeeenster commented 3 years ago

We are seeing this issue too, also with the python SDK although I'm not sure if this is relevant.

image

dabeeeenster commented 3 years ago

I am writing to the EU AWS datacenter if that is of any help

gefaila commented 3 years ago

For the last 24 hours I have also been simultaneously writing the same data to a 'junk' bucket using http using the Python requests library

import requests influx_returns=requests.post(url, data=Influx_lines, headers=headers)

While this helps illuminate where the bug may be, I believe that under the hood the Python API client library also uses http.

Note I'm also writing to https://eu-central-1-1.aws.cloud2.influxdata.com

gefaila commented 3 years ago

If there was some stub code to use urllib3 to write line data to InfluxDB 2.0 cloud then I could test whether the failures also happen with the urllib3 library which I think matches what Python API client uses. Can anyone help with that?

bednar commented 3 years ago

@gefaila, here is a code for urllib3:

import urllib3

base_url = 'http://localhost:8086'
org = 'my-org'
token = 'my-token'
bucket = 'my-bucket'

headers = {'Content-Type': 'application/vnd.flux', 'Authorization': ('Token %s' % token)}
url = '%s/api/v2/write?org=%s&bucket=%s&precision=ms' % (base_url, org, bucket)
payload = 'cpu_load_short,host=server01,region=us-west value1=99.64\ncpu_load_short,host=server01,region=us-west value2=5.64\n'

with urllib3.PoolManager() as http:
    r = http.request(
        'POST',
        url,
        body=payload,
        headers=headers
    )

    print(f"Response status: '{r.status}', success: {r.status == 204}\n")

based on https://community.influxdata.com/t/write-to-influxdb-cloud-fails-with-apiexception-503-reason-service-unavailable-upstream-connect-error-or-disconnect-reset-before-headers-reset-reason-connection-failure/21501/28

gefaila commented 3 years ago

Hi @bednar , Sorry asmith == @gefaila == myself That's the code I myself posted here. And I posted that because it doesn't work. :-)

As I explain in the post there, it fails with

HTTPSConnectionPool(host=‘eu-central-1-1.aws.cloud2.influxdata.com’, port=443): Max retries exceeded with url: /api/v2/write?orgID=xxxxxxxxxxxx&bucket=TEST_bucket&precision=ms (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))

I think what's needed is some code that has actually been verified as working with InfluxDB 2.0 Cloud

bednar commented 3 years ago

The code already works with cloud, but you have to install the SSL certificates to your keychain or install https://pypi.org/project/certifi/:

pip install certifi

or disable certification validation by:

with urllib3.PoolManager(cert_reqs='CERT_NONE') as http:
    r = http.request(
        'POST',
        url,
        body=payload,
        headers=headers
    )

    print(f"Response status: '{r.status}', success: {r.status == 204}\n")
gefaila commented 3 years ago

Thanks @bednar. So are you saying that if I install certifi then exactly the same code will succeed? Do I not need to edit the code to in some way perform the certification validation

import` certifi
bednar commented 3 years ago

Do I not need to edit the code to in some way perform the certification validation

import certifi
...

with urllib3.PoolManager(ca_certs=certifi.where()) as http:

...
gefaila commented 3 years ago

Thanks @bednar That small correction makes the code work. So for others to find: I posted the working sample code complete with certifi here

gefaila commented 3 years ago

However, I woke up this morning to some more insights about the actual subject of this bug which is periodic (503) errors

To help debug this issue I am sending all data three times using separate AWS Lambda functions

AWS Lambda functions operate in containers so this is three independent containers using different libraries that are sending the same data to different buckets.

Here's the thing:

This now points to the conclusion that the endpoint at https://eu-central-1-1.aws.cloud2.influxdata.com is not available for short periods causing writes to fail (from different clients) in those short periods.

@timestamp @source Code
2021-09-08 09:00:17.30 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-08 01:59:26.06 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-08 01:48:35.31 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-08 01:01:12.69 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-08 01:01:12.69 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-08 00:11:08.60 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 23:45:46.30 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 23:38:40.06 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 22:34:59.79 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 22:34:59.51 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 22:26:12.45 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 22:20:21.45 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 16:06:48.75 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 16:05:03.74 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 16:02:33.74 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 16:02:33.74 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 16:02:32.23 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 16:02:32.22 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 15:16:44.25 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 15:02:41.17 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 15:02:41.16 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 15:02:38.96 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 15:02:38.67 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 15:02:36.67 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 15:02:35.01 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 15:02:34.68 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 15:02:34.37 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 15:02:33.76 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 15:02:33.72 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 15:02:32.57 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 15:02:32.57 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 15:02:32.56 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 15:02:32.20 lambda/MQTT_InfluxDB_v3 InfluxDB API client
2021-09-07 15:02:32.19 lambda/MQTT_InfluxDB_v2 http/requests
2021-09-07 15:02:32.18 lambda/MQTT_InfluxDB InfluxDB API client
2021-09-07 14:35:03.53 lambda/MQTT_InfluxDB_v3 InfluxDB API client
gefaila commented 3 years ago

The retries code (v3 which is also failing is something like this:

from influxdb_client import InfluxDBClient
from urllib3 import Retry 
from influxdb_client.client.write_api import SYNCHRONOUS
status_forcelist = (500, 502, 504, 503)

retries = Retry(status_forcelist=status_forcelist, total=10 , redirect=0, backoff_factor=0.1, raise_on_redirect=True, raise_on_status=True, connect=5, read=5, status=5 ) 
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries)
write_api = influx_client.write_api(write_options=SYNCHRONOUS)

and then writes using

write_api.write(InfluxBucket, "xxxxxxxxxxxxxx", Influx_lines,'ms')
gefaila commented 3 years ago

@dabeeeenster we are both writing to https://eu-central-1-1.aws.cloud2.influxdata.com are we not? if you can give me UTC timestamps for failures we may see some kind of pattern.
Are your writes getting (503) errors when my writes are?

dabeeeenster commented 3 years ago

Yes - we are hitting that endpoint.

Does this help?

"Sep 8, 2021 12:07:40 PM UTC ApiException: (503)" "Sep 8, 2021 12:06:53 PM UTC ApiException: (503)" "Sep 8, 2021 12:04:49 PM UTC ApiException: (503)" "Sep 8, 2021 11:57:38 AM UTC ApiException: (503)" "Sep 8, 2021 11:57:38 AM UTC ApiException: (503)" "Sep 8, 2021 11:57:37 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:44 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:44 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:51:51 AM UTC ApiException: (503)" "Sep 8, 2021 11:50:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:45:40 AM UTC ApiException: (503)" "Sep 8, 2021 11:45:40 AM UTC ApiException: (503)" "Sep 8, 2021 11:44:29 AM UTC ApiException: (503)" "Sep 8, 2021 11:42:30 AM UTC ApiException: (503)" "Sep 8, 2021 11:42:30 AM UTC ApiException: (503)" "Sep 8, 2021 11:41:41 AM UTC ApiException: (503)" "Sep 8, 2021 11:38:00 AM UTC ApiException: (503)" "Sep 8, 2021 11:38:00 AM UTC ApiException: (503)" "Sep 8, 2021 11:38:00 AM UTC ApiException: (503)" "Sep 8, 2021 11:29:32 AM UTC ApiException: (503)" "Sep 8, 2021 11:29:31 AM UTC ApiException: (503)" "Sep 8, 2021 11:27:47 AM UTC ApiException: (503)" "Sep 8, 2021 11:27:47 AM UTC ApiException: (503)" "Sep 8, 2021 11:26:53 AM UTC ApiException: (503)" "Sep 8, 2021 11:26:53 AM UTC ApiException: (503)" "Sep 8, 2021 11:25:07 AM UTC ApiException: (503)" "Sep 8, 2021 11:25:07 AM UTC ApiException: (503)" "Sep 8, 2021 11:21:27 AM UTC ApiException: (503)" "Sep 8, 2021 11:17:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:17:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:15:25 AM UTC ApiException: (503)" "Sep 8, 2021 11:15:24 AM UTC ApiException: (503)" "Sep 8, 2021 11:15:24 AM UTC ApiException: (503)" "Sep 8, 2021 11:14:53 AM UTC ApiException: (503)" "Sep 8, 2021 11:12:52 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:21 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:20 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:20 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:14 AM UTC ApiException: (503)" "Sep 8, 2021 11:07:10 AM UTC ApiException: (503)" "Sep 8, 2021 11:00:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:00:08 AM UTC ApiException: (503)" "Sep 8, 2021 10:54:24 AM UTC ApiException: (503)" "Sep 8, 2021 10:54:24 AM UTC ApiException: (503)" "Sep 8, 2021 10:54:23 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:57 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:22 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:06 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:05 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:05 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:04 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:04 AM UTC ApiException: (503)" "Sep 8, 2021 10:13:57 AM UTC ApiException: (503)" "Sep 8, 2021 10:13:57 AM UTC ApiException: (503)" "Sep 8, 2021 10:12:11 AM UTC ApiException: (503)" "Sep 8, 2021 10:12:10 AM UTC ApiException: (503)" "Sep 8, 2021 10:09:48 AM UTC ApiException: (503)" "Sep 8, 2021 10:09:48 AM UTC ApiException: (503)" "Sep 8, 2021 10:09:47 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:20 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:19 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:18 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:18 AM UTC ApiException: (503)" "Sep 8, 2021 10:01:42 AM UTC ApiException: (503)" "Sep 8, 2021 10:01:42 AM UTC ApiException: (503)" "Sep 8, 2021 10:01:41 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:35 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:35 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:34 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:34 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:32 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:25 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:25 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:25 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:24 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:23 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:59 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:58 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:48 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:47 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:46 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:46 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:46 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:45 AM UTC ApiException: (503)"

gefaila commented 3 years ago

So you are writing much faster than we are. But the facts are:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | Timestamp | Source | Error -- | -- | -- | -- 1 | 2021-09-08T12:06:54.362Z | lambda/MQTT_InfluxDB_v3 | (503) 2 | 2021-09-08T10:16:05.387Z | lambda/MQTT_InfluxDB_v3 | (503) 3 | 2021-09-08T09:57:57.824Z | lambda/MQTT_InfluxDB_v3 | (503)

gefaila commented 3 years ago

@Anaisdg does this help the storage team to get to the bottom of it?

gefaila commented 3 years ago

Morning @Anaisdg , is there any progress from the storage team with reproducing or fixing this? I don't know what priority this has been given. But, as it stands, multiple paying customers are losing data because of periodic unavailability of InfluxDB to receive data.

gefaila commented 3 years ago

Hi @bednar , I think the problem may be related to how retries are handled by the Python Influx client (API). I am normally only writing 1-3 lines of data. Am I right that if it's one line, the python client will not use batching and therefore doesn't use retries?

How do I set up retries for all writes whether they are batch writes or not?

bednar commented 3 years ago

Am I right that if it's one line, the python client will not use batching and therefore doesn't use retries?

The python client uses retry strategy for batching strategy.

How do I set up retries for all writes whether they are batch writes or not?

You can configure retries by: https://github.com/influxdata/influxdb-client-python#http-retry-strategy

gefaila commented 3 years ago

Hi @bednar , See above, the code which is failing is already using retries

The retries code (v3 which is also failing is something like this:

from influxdb_client import InfluxDBClient
from urllib3 import Retry 
from influxdb_client.client.write_api import SYNCHRONOUS
status_forcelist = (500, 502, 504, 503)

retries = Retry(status_forcelist=status_forcelist, total=10 , redirect=0, backoff_factor=0.1, raise_on_redirect=True, raise_on_status=True, connect=5, read=5, status=5 ) 
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries)
write_api = influx_client.write_api(write_options=SYNCHRONOUS)

and then writes using

write_api.write(InfluxBucket, "xxxxxxxxxxxxxx", Influx_lines,'ms')
gefaila commented 3 years ago

To clarify: retries don't work properly because the python client is failing to attempt retries in the case that writing a single line fails with (503)

I know that your suggested retries code doesn't work for two reasons

  1. if it was working, the exception would me "Max retries exceeded". Not the (503) error
  2. if I hand code retries using the code below, retries succeed. Therefore using retries=retries doesn't work.
            for i in range(1,10):
                try:
                    influx_returns = write_api.write(InfluxBucket, "xxxxxxxxxxx", Influx_lines,'ms')
                    if i>1:
                        logmessage = JSON_log(f"write_api.write SUCCESS on try {i}", logging.ERROR, 2005, False, event, context)
                    break #
                except:
                    logmessage = JSON_log(f"write_api.write FAIL on try {i}", logging.ERROR, 2006, True, event, context)
                    if i>2:
                        time.sleep(0.1) #try twice immediately and thereafter at 100ms intervals
gefaila commented 3 years ago

Also, I get a failure when I use retry=retry as pwer the sample code you sent:

from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS
from urllib3 import Retry 

status_forcelist = (500, 502, 504, 503)
retries = Retry(status_forcelist=status_forcelist, total=10 , redirect=0, backoff_factor=0.5, raise_on_redirect=True, raise_on_status=True, connect=5, read=5, status=5 ) #++ https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry

influx_client = InfluxDBClient(url="my-url", token="my-token", org="my-org") # THIS WORKS
influx_client = InfluxDBClient(url="my-url", token="my-token", org="my-org", retries=retries) # THIS FAILS

write_api = influx_client.write_api(write_options=SYNCHRONOUS)

The code fails with

line 9, in influx_client = InfluxDBClient(url="my-url", token="my-token", org="my-org", retries=retries) # THIS FAILS

TypeError: init() got an unexpected keyword argument 'retries'

I think it's clear that retries isn't working properly in the Python API.

bednar commented 3 years ago

The code fails with

line 9, in influx_client = InfluxDBClient(url="my-url", token="my-token", org="my-org", retries=retries) # THIS FAILS

TypeError: init() got an unexpected keyword argument 'retries'

Which version of the client do you use? The retries option was introduced in v1.10.0 - https://github.com/influxdata/influxdb-client-python/releases/tag/v1.10.0.

The retries code (v3 which is also failing is something like this:

from influxdb_client import InfluxDBClient
from urllib3 import Retry 
from influxdb_client.client.write_api import SYNCHRONOUS
status_forcelist = (500, 502, 504, 503)

retries = Retry(status_forcelist=status_forcelist, total=10 , redirect=0, backoff_factor=0.1, raise_on_redirect=True, raise_on_status=True, connect=5, read=5, status=5 ) 
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries)
write_api = influx_client.write_api(write_options=SYNCHRONOUS)

You have to also configure allowed_methods option. Try something like:

retries = Retry(connect=5, read=2, redirect=5, allowed_methods=["POST"], status_forcelist=[503], backoff_factor=0.5)
influxdb_client = InfluxDBClient(url="http://localhost:8086", token="my-token", retries=retries, debug=True)

Regards

gefaila commented 3 years ago

@bednar , we are maybe forgetting the original problem which is that InfluxDB is frequently unavailable with (503) errors. Retries may hide the problem. But I'm finding that the code below (which retries 10 times) is also sometimes getting (503) errors for ALL 10 TIMES.

for i in range(0,10):
                try:
                    influx_returns = write_api.write(InfluxBucket, "xxxxxxxxxxx", Influx_lines,'ms')
                    if i>1:
                        logmessage = JSON_log(f"write_api.write SUCCESS on try {i}", logging.ERROR, 2005, False, event, context)
                    break #
                except:
                    logmessage = JSON_log(f"write_api.write FAIL on try {i}", logging.ERROR, 2006, True, event, context)
                    if i>2:
                        time.sleep(0.1) #try twice immediately and thereafter at 100ms intervals

Can we also solve the problem with InfluxDB being frequently unavailable ? High availability and write rate are why we use InfluxDB cloud as a commercial solution.

gefaila commented 3 years ago

I note from the error logs that when I get the (503) error I'm now getting an additional detail mentioning nginx in the error

nginx influxdb_client.rest.ApiException: (503) Reason: Service Temporarily Unavailable

gefaila commented 3 years ago

And there are now a flood of these errors mentioning "nginx" and that's new: image

gefaila commented 3 years ago

Furthermore these (503) errors hit a massive spike today with roughly 150 errors every 5 minutes for most of the day image

This is stable code, nothing has changed for months and we have been used to close on 100% records succeeding. But the availability of InfluxDB cloud is really really concerning to us.

gefaila commented 3 years ago

The code fails with

line 9, in influx_client = InfluxDBClient(url="my-url", token="my-token", org="my-org", retries=retries) # THIS FAILS

TypeError: init() got an unexpected keyword argument 'retries'

Which version of the client do you use? The retries option was introduced in v1.10.0 - https://github.com/influxdata/influxdb-client-python/releases/tag/v1.10.0.

The retries code (v3 which is also failing is something like this:

from influxdb_client import InfluxDBClient
from urllib3 import Retry 
from influxdb_client.client.write_api import SYNCHRONOUS
status_forcelist = (500, 502, 504, 503)

retries = Retry(status_forcelist=status_forcelist, total=10 , redirect=0, backoff_factor=0.1, raise_on_redirect=True, raise_on_status=True, connect=5, read=5, status=5 ) 
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries)
write_api = influx_client.write_api(write_options=SYNCHRONOUS)

You have to also configure allowed_methods option. Try something like:

retries = Retry(connect=5, read=2, redirect=5, allowed_methods=["POST"], status_forcelist=[503], backoff_factor=0.5)
influxdb_client = InfluxDBClient(url="http://localhost:8086", token="my-token", retries=retries, debug=True)

Regards

@bednar , you may have a point here. I am using InfluxDB python client as a layer in AWS. When I do pip list

I get

influxdb 5.3.0 influxdb-client 1.6.0

I understand that InfluxDB will be updating libraries etc. But it's very concerning if infrastructure that was working can be broken by an assumption that all clients will be using the latest python libraries. Does this mean that we can only keep things running well by constantly updating the AWS Layer every time the python libraries are updated?

And into this discussion don't forget that the http version (which doesn't use the Python Influx libraries) also fails with (503) and only started doing that a month ago. So updating my python influxdb libraries will not cause the http version to start working.

dabeeeenster commented 3 years ago

I wonder if turning the EU datacenter off and on again will fix it 😬

bednar commented 3 years ago

The code fails with

line 9, in influx_client = InfluxDBClient(url="my-url", token="my-token", org="my-org", retries=retries) # THIS FAILS

TypeError: init() got an unexpected keyword argument 'retries'

Which version of the client do you use? The retries option was introduced in v1.10.0 - https://github.com/influxdata/influxdb-client-python/releases/tag/v1.10.0.

The retries code (v3 which is also failing is something like this:

from influxdb_client import InfluxDBClient
from urllib3 import Retry 
from influxdb_client.client.write_api import SYNCHRONOUS
status_forcelist = (500, 502, 504, 503)

retries = Retry(status_forcelist=status_forcelist, total=10 , redirect=0, backoff_factor=0.1, raise_on_redirect=True, raise_on_status=True, connect=5, read=5, status=5 ) 
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries)
write_api = influx_client.write_api(write_options=SYNCHRONOUS)

You have to also configure allowed_methods option. Try something like:

retries = Retry(connect=5, read=2, redirect=5, allowed_methods=["POST"], status_forcelist=[503], backoff_factor=0.5)
influxdb_client = InfluxDBClient(url="http://localhost:8086", token="my-token", retries=retries, debug=True)

Regards

@bednar , you may have a point here. I am using InfluxDB python client as a layer in AWS. When I do pip list

I get

influxdb 5.3.0 influxdb-client 1.6.0

I understand that InfluxDB will be updating libraries etc. But it's very concerning if infrastructure that was working can be broken by an assumption that all clients will be using the latest python libraries. Does this mean that we can only keep things running well by constantly updating the AWS Layer every time the python libraries are updated?

No. It is workaround for current problem with 503.

As @Anaisdg said: As far as 503’s go, that the issue is known and Engineering is working hard to fix that as a high priority. They have already applied a mitigation.

gefaila commented 3 years ago

@dabeeeenster , have you noticed any improvement in the rate of 503 errors from Influx? I've noticed a reduction in errors since 17th Sept. You?

dabeeeenster commented 3 years ago

We only re-enabled writes to Influx 4 hours ago, but we haven't seen any errors since then interestingly.

gefaila commented 3 years ago

I had timeout errors on 21/9/2021 when http calls were taking more than 3 sec to return. But since then I've had no more 503 "unavailable" errors from the Python API or http.

I guess it's fixed? But we're in the dark about what it was.

gefaila commented 3 years ago

@dabeeeenster any observations from your side? Either a reduction, no change or a complete end to the 503 errors?

dabeeeenster commented 3 years ago

I'm not seeing these errors any more

dabeeeenster commented 3 years ago

Starting to see these trickle in again: image SSLError EOF occurred in violation of protocol (_ssl.c:852)

gefaila commented 3 years ago

Yes I am too!

2021-09-28 14:51:50 {"message":"2021-09-28T13:51:50.439Z bcde10af-3968-4807-a353-d9a299bbb53c Task timed out after 3.00 seconds ","function":"MQTT_InfluxDB_v3"}
2021-09-28 14:51:48 {"message":"2021-09-28T13:51:48.602Z e78bac80-11a5-448f-9b49-aec34236d06f Task timed out after 3.00 seconds ","function":"MQTT_InfluxDB_v3"}

dabeeeenster commented 3 years ago

7 hour outage and this issue recurs - really not good enough I'm afraid.

gefaila commented 3 years ago

These timeout messages was another symptom of the 503 error. The effect was the same - inability to write to the InfluxDB

gefaila commented 3 years ago

A 7 hour outage? When??

dabeeeenster commented 3 years ago

https://status.influxdata.com/incidents/l7943t0xlm01

bednar commented 3 years ago

This issue has been closed because it has not had recent activity. Please reopen if this issue is still important to you and you have additionally information.

For more information track the community topic - https://community.influxdata.com/t/write-to-influxdb-cloud-fails-with-apiexception-503-reason-service-unavailable-upstream-connect-error-or-disconnect-reset-before-headers-reset-reason-connection-failure/21501

gefaila commented 2 years ago

@bednar we are not getting 503 errors but we are (instead) still getting timeouts every few hours. @dabeeeenster do you consider this issue "closed" (i.e. you are not getting 503 errors any more) I think therefore that the issue is now manifesting itself as InfluxDB doesn't answer when we post data, whereas before we at least got a 503. I think not answering is worse because it consumes time to wait. I'll await your comments @dabeeeenster and then maybe open another issue to address the occasional timeouts if necessary

dabeeeenster commented 2 years ago

Yep looks good from here thanks