Closed gefaila closed 3 years ago
We are seeing this issue too, also with the python SDK although I'm not sure if this is relevant.
I am writing to the EU AWS datacenter if that is of any help
For the last 24 hours I have also been simultaneously writing the same data to a 'junk' bucket using http using the Python requests library
import requests
influx_returns=requests.post(url, data=Influx_lines, headers=headers)
While this helps illuminate where the bug may be, I believe that under the hood the Python API client library also uses http.
Note I'm also writing to https://eu-central-1-1.aws.cloud2.influxdata.com
If there was some stub code to use urllib3 to write line data to InfluxDB 2.0 cloud then I could test whether the failures also happen with the urllib3 library which I think matches what Python API client uses. Can anyone help with that?
@gefaila, here is a code for urllib3
:
import urllib3
base_url = 'http://localhost:8086'
org = 'my-org'
token = 'my-token'
bucket = 'my-bucket'
headers = {'Content-Type': 'application/vnd.flux', 'Authorization': ('Token %s' % token)}
url = '%s/api/v2/write?org=%s&bucket=%s&precision=ms' % (base_url, org, bucket)
payload = 'cpu_load_short,host=server01,region=us-west value1=99.64\ncpu_load_short,host=server01,region=us-west value2=5.64\n'
with urllib3.PoolManager() as http:
r = http.request(
'POST',
url,
body=payload,
headers=headers
)
print(f"Response status: '{r.status}', success: {r.status == 204}\n")
Hi @bednar , Sorry asmith == @gefaila == myself That's the code I myself posted here. And I posted that because it doesn't work. :-)
As I explain in the post there, it fails with
HTTPSConnectionPool(host=‘eu-central-1-1.aws.cloud2.influxdata.com’, port=443): Max retries exceeded with url: /api/v2/write?orgID=xxxxxxxxxxxx&bucket=TEST_bucket&precision=ms (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)])”)))
I think what's needed is some code that has actually been verified as working with InfluxDB 2.0 Cloud
The code already works with cloud, but you have to install the SSL certificates to your keychain or install https://pypi.org/project/certifi/:
pip install certifi
or disable certification validation by:
with urllib3.PoolManager(cert_reqs='CERT_NONE') as http:
r = http.request(
'POST',
url,
body=payload,
headers=headers
)
print(f"Response status: '{r.status}', success: {r.status == 204}\n")
Thanks @bednar. So are you saying that if I install certifi then exactly the same code will succeed? Do I not need to edit the code to in some way perform the certification validation
import` certifi
Do I not need to edit the code to in some way perform the certification validation
import certifi
...
with urllib3.PoolManager(ca_certs=certifi.where()) as http:
...
Thanks @bednar That small correction makes the code work. So for others to find: I posted the working sample code complete with certifi here
However, I woke up this morning to some more insights about the actual subject of this bug which is periodic (503) errors
To help debug this issue I am sending all data three times using separate AWS Lambda functions
AWS Lambda functions operate in containers so this is three independent containers using different libraries that are sending the same data to different buckets.
Here's the thing:
This now points to the conclusion that the endpoint at https://eu-central-1-1.aws.cloud2.influxdata.com is not available for short periods causing writes to fail (from different clients) in those short periods.
@timestamp | @source | Code |
2021-09-08 09:00:17.30 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-08 01:59:26.06 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-08 01:48:35.31 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-08 01:01:12.69 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-08 01:01:12.69 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-08 00:11:08.60 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 23:45:46.30 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 23:38:40.06 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 22:34:59.79 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 22:34:59.51 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 22:26:12.45 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 22:20:21.45 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 16:06:48.75 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 16:05:03.74 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 16:02:33.74 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 16:02:33.74 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 16:02:32.23 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 16:02:32.22 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 15:16:44.25 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 15:02:41.17 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 15:02:41.16 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 15:02:38.96 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 15:02:38.67 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 15:02:36.67 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 15:02:35.01 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 15:02:34.68 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 15:02:34.37 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 15:02:33.76 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 15:02:33.72 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 15:02:32.57 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 15:02:32.57 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 15:02:32.56 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 15:02:32.20 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
2021-09-07 15:02:32.19 | lambda/MQTT_InfluxDB_v2 | http/requests |
2021-09-07 15:02:32.18 | lambda/MQTT_InfluxDB | InfluxDB API client |
2021-09-07 14:35:03.53 | lambda/MQTT_InfluxDB_v3 | InfluxDB API client |
The retries code (v3 which is also failing is something like this:
from influxdb_client import InfluxDBClient
from urllib3 import Retry
from influxdb_client.client.write_api import SYNCHRONOUS
status_forcelist = (500, 502, 504, 503)
retries = Retry(status_forcelist=status_forcelist, total=10 , redirect=0, backoff_factor=0.1, raise_on_redirect=True, raise_on_status=True, connect=5, read=5, status=5 )
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries)
write_api = influx_client.write_api(write_options=SYNCHRONOUS)
and then writes using
write_api.write(InfluxBucket, "xxxxxxxxxxxxxx", Influx_lines,'ms')
@dabeeeenster
we are both writing to https://eu-central-1-1.aws.cloud2.influxdata.com are we not?
if you can give me UTC timestamps for failures we may see some kind of pattern.
Are your writes getting (503) errors when my writes are?
Yes - we are hitting that endpoint.
Does this help?
"Sep 8, 2021 12:07:40 PM UTC ApiException: (503)" "Sep 8, 2021 12:06:53 PM UTC ApiException: (503)" "Sep 8, 2021 12:04:49 PM UTC ApiException: (503)" "Sep 8, 2021 11:57:38 AM UTC ApiException: (503)" "Sep 8, 2021 11:57:38 AM UTC ApiException: (503)" "Sep 8, 2021 11:57:37 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:44 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:44 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:55:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:51:51 AM UTC ApiException: (503)" "Sep 8, 2021 11:50:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:45:40 AM UTC ApiException: (503)" "Sep 8, 2021 11:45:40 AM UTC ApiException: (503)" "Sep 8, 2021 11:44:29 AM UTC ApiException: (503)" "Sep 8, 2021 11:42:30 AM UTC ApiException: (503)" "Sep 8, 2021 11:42:30 AM UTC ApiException: (503)" "Sep 8, 2021 11:41:41 AM UTC ApiException: (503)" "Sep 8, 2021 11:38:00 AM UTC ApiException: (503)" "Sep 8, 2021 11:38:00 AM UTC ApiException: (503)" "Sep 8, 2021 11:38:00 AM UTC ApiException: (503)" "Sep 8, 2021 11:29:32 AM UTC ApiException: (503)" "Sep 8, 2021 11:29:31 AM UTC ApiException: (503)" "Sep 8, 2021 11:27:47 AM UTC ApiException: (503)" "Sep 8, 2021 11:27:47 AM UTC ApiException: (503)" "Sep 8, 2021 11:26:53 AM UTC ApiException: (503)" "Sep 8, 2021 11:26:53 AM UTC ApiException: (503)" "Sep 8, 2021 11:25:07 AM UTC ApiException: (503)" "Sep 8, 2021 11:25:07 AM UTC ApiException: (503)" "Sep 8, 2021 11:21:27 AM UTC ApiException: (503)" "Sep 8, 2021 11:17:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:17:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:15:25 AM UTC ApiException: (503)" "Sep 8, 2021 11:15:24 AM UTC ApiException: (503)" "Sep 8, 2021 11:15:24 AM UTC ApiException: (503)" "Sep 8, 2021 11:14:53 AM UTC ApiException: (503)" "Sep 8, 2021 11:12:52 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:22 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:21 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:20 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:20 AM UTC ApiException: (503)" "Sep 8, 2021 11:11:14 AM UTC ApiException: (503)" "Sep 8, 2021 11:07:10 AM UTC ApiException: (503)" "Sep 8, 2021 11:00:08 AM UTC ApiException: (503)" "Sep 8, 2021 11:00:08 AM UTC ApiException: (503)" "Sep 8, 2021 10:54:24 AM UTC ApiException: (503)" "Sep 8, 2021 10:54:24 AM UTC ApiException: (503)" "Sep 8, 2021 10:54:23 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:57 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:22 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:06 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:05 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:05 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:04 AM UTC ApiException: (503)" "Sep 8, 2021 10:16:04 AM UTC ApiException: (503)" "Sep 8, 2021 10:13:57 AM UTC ApiException: (503)" "Sep 8, 2021 10:13:57 AM UTC ApiException: (503)" "Sep 8, 2021 10:12:11 AM UTC ApiException: (503)" "Sep 8, 2021 10:12:10 AM UTC ApiException: (503)" "Sep 8, 2021 10:09:48 AM UTC ApiException: (503)" "Sep 8, 2021 10:09:48 AM UTC ApiException: (503)" "Sep 8, 2021 10:09:47 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:20 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:19 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:18 AM UTC ApiException: (503)" "Sep 8, 2021 10:04:18 AM UTC ApiException: (503)" "Sep 8, 2021 10:01:42 AM UTC ApiException: (503)" "Sep 8, 2021 10:01:42 AM UTC ApiException: (503)" "Sep 8, 2021 10:01:41 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:35 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:35 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:34 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:34 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:33 AM UTC ApiException: (503)" "Sep 8, 2021 9:59:32 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:25 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:25 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:25 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:24 AM UTC ApiException: (503)" "Sep 8, 2021 9:58:23 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:59 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:58 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:57 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:48 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:47 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:46 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:46 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:46 AM UTC ApiException: (503)" "Sep 8, 2021 9:57:45 AM UTC ApiException: (503)"
So you are writing much faster than we are. But the facts are:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
| Timestamp | Source | Error -- | -- | -- | -- 1 | 2021-09-08T12:06:54.362Z | lambda/MQTT_InfluxDB_v3 | (503) 2 | 2021-09-08T10:16:05.387Z | lambda/MQTT_InfluxDB_v3 | (503) 3 | 2021-09-08T09:57:57.824Z | lambda/MQTT_InfluxDB_v3 | (503)
Steps to reproduce: Using the Python client make repeated calls to the write API. Do this simultaneously from 2 or 3 different clients (not sure if this is relevant but this is the fail condition) Write rates are a call to the API every second or so I'm using the SYNCHRONOUS calls
influx_client = InfluxDBClient(url=os.environ['influx_url'], token=os.environ['token'],retries=retries) write_api = influx_client.write_api(write_options=SYNCHRONOUS)
(repeatedly)influx_returns = write_api.write(InfluxBucket, my_org, Influx_lines,'ms')
Expected behavior: Until 17th Aug we were seeing the expected behaviour. All the writes succeeded.
Data was written reliably
Actual behavior: From 17th August we started getting
The errors definitely started on 17th Aug and I know we didn't change anything because everyone was on holiday! I have AWS logs which show when this started
You can see the discussion here that it's affecting multiple users now
Specifications: