Closed 8none1 closed 2 years ago
As part of this, we need a troubleshooting section in the docs. Everything in the current documentation related to writes assumes a very sunny day scenario.
I would like to see a new Troubleshooting section added here:
We need to describe in detail the potential ways in which writes can fail, what the response codes are for writes, and then include this new content for the exception cases. Write failure scenarios should include:
See Responses section here: https://docs.influxdata.com/influxdb/cloud/api/#operation/PostWrite (and note that the responses listed in the right hand column are incomplete. A more complete list is found in the main body panel.
Additionally, we have a new reason writes could fail... which is based on a payload not conforming to an explicit schema bucket. That should be described and documented as well.
Currently the user would receive a 202 HTTP code to say that their write had been accepted. This could be interpreted as a "everything is fine". Then when they come to read that data they find it is missing from their results.
Success code semantics: a successful or unsuccessful write currently returns 204 No Content
. 202
would be more appropriate to indicate that the request succeeded, but the write operation could fail asynchronously.
@8none1 Based on your description and the responses I get from OSS and Cloud currently, the API won't return 400
for a malformed line protocol payload, correct? That's contrary to 400
described in https://docs.influxdata.com/influxdb/cloud/api/#operation/PostWrite.
For example, this still returns 204
:
--data-raw "
memhost=host1 used_percent=25.4345351630076819792518000
memhost=host2 used_percent=25.4345351630076819792518000
" \
Is there some other condition that can return a 400? I haven't found it yet.
@8none1 Based on your description and the responses I get from OSS and Cloud currently, the API won't return
400
for a malformed line protocol payload, correct? That's contrary to400
described in https://docs.influxdata.com/influxdb/cloud/api/#operation/PostWrite. For example, this still returns204
:--data-raw " memhost=host1 used_percent=25.4345351630076819792518000 memhost=host2 used_percent=25.4345351630076819792518000 " \
Is there some other condition that can return a 400? I haven't found it yet.
Nevermind. I'm wrong. I was able to get a 400
. That was a bad example that is actually valid LP and did write.
@8none1 or @rogpeppe Has the rejected_points
logging been deployed? I'm trying to test it for documentation, but so far haven't seen any in _monitoring
.
Has the rejected_points logging been deployed?
It is not available for customers currently.
Heads up for folks following this feature. We've quietly enabled it on prod101-us-east-1 AWS to see how it performs in the real world. Depending on how that goes, we can start getting ready to roll it out everywhere.
Hi folks, we're getting ready to enable this for everyone. What do you need to be able to move this forwards? Let us know how we can help.
Hi folks, we're getting ready to enable this for everyone. What do you need to be able to move this forwards? Let us know how we can help.
@8none1 Has anything changed in the monitoring
logging since your last update on 9/28? If not, we just need to resolve some conflicts in the branch and push it out.
Resolved conflicts and fixed a few issues in https://github.com/influxdata/docs-v2/pull/3109
@8none1 Has anything changed in the
monitoring
logging since your last update on 9/28? If not, we just need to resolve some conflicts in the branch and push it out.
Yoiks! Sorry for the delay. Nothing has changed.
Really excited to see this out in the wild soon!
PR:
Subsequent issues:
error
field.Engineering contacts: @rogpeppe , @8none1
Introduction.
Sometimes a user tries to write data in to Cloud 2 which, although being accepted by Gateway, is ultimately rejected and not written to the database. This could have for a number of reasons, but typically this is because of mismatched data types, for example they are trying to write a string to an int field.
Currently the user would receive a 202 HTTP code to say that their write had been accepted. This could be interpreted as a "everything is fine". Then when they come to read that data they find it is missing from their results.
This new feature logs "rejected writes" to the
_monitoring
bucket for that org_id.This is an example of a line which would be written to the bucket:
Information logged to _monitoring bucket about dropped points
When a point is dropped for some reason (for example because of a type clash with a previously written point that has the same series), the client writing the point does not get immediate feedback, because the infrastructure inside InfluxData does not necessarily know of problems when the point is first written.
Previously, points like this were just dropped silently, but this is about to change (or has changed, depending on when you are reading this). Now, whenever a point is dropped an entry is added to the organization's
_monitoring
bucket with some information about the point and why it was dropped.The entries that are added will have the
rejected_points
measurement. Here's an example:Note that the field value (
count
) will always be 1 - all the information of interest is in the tags. Thebucket
andreason
tags will always be present; other tags depend on the error in question.A brief description of the tag fields and what they mean:
bucket
: the bucket ID of the bucket that the point was to have been written to (always present).reason
: a brief textual description of the reason.field
: the field name of the point (always present if the point had a field)measurement
: the measurement of the point (always present if the point had a measurement)gotType
: the type of the field value in the pointwantType
: the type that the field value should have been.Note that all the information about the dropped point is not written to
_monitoring
(for example, tags are not present). This is deliberate, to try to keep the cardinality of the_monitoring
bucket under control.Here is a more formal description of the entries in the
_monitoring
bucket, expressed in the CUE language.