open-telemetry / otel-arrow

Protocol and libraries for sending and receiving OpenTelemetry data using Apache Arrow
Apache License 2.0
68 stars 13 forks source link

Improve error message when batches are too big #226

Open lquerel opened 2 months ago

lquerel commented 2 months ago

For more details, see this bug report -> https://github.com/tigrannajaryan/otel-arrow-bug

The existing error message is quite poor. It appears that the batch size exceeds the maximum capacity for the number of metrics per batch (max uint16). This error message must be improved.

jmacd commented 2 months ago

If I check for MaxUint16 at the location where metricID is allocated, I get an error like this:

panic: github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record.(*Producer).BatchArrowRecordsFromMetrics:224->github.com/open-telemetry/otel-arrow/pkg/otel/metrics/arrow.(*MetricsBuilder).Append:186->integer overflow

This is hardly an improvement -- @lquerel do you think we should upgrade to 32-bits at this point?

diff --git a/pkg/otel/metrics/arrow/metrics.go b/pkg/otel/metrics/arrow/metrics.go
index 58433481..c47ac81b 100644
--- a/pkg/otel/metrics/arrow/metrics.go
+++ b/pkg/otel/metrics/arrow/metrics.go
@@ -15,6 +15,7 @@
 package arrow

 import (
+   "fmt"
    "math"

    "github.com/apache/arrow/go/v16/arrow"
@@ -45,6 +46,8 @@ var (
        {Name: constants.AggregationTemporality, Type: arrow.PrimitiveTypes.Int32, Metadata: schema.Metadata(schema.Optional, schema.Dictionary8)},
        {Name: constants.IsMonotonic, Type: arrow.FixedWidthTypes.Boolean, Metadata: schema.Metadata(schema.Optional)},
    }, nil)
+
+   errIntegerOverflow = fmt.Errorf("integer overflow")
 )

 // MetricsBuilder is a helper to build a list of resource metrics.
@@ -179,6 +182,9 @@ func (b *MetricsBuilder) Append(metrics pmetric.Metrics) error {

    for _, metric := range optimizedMetrics.Metrics {
        ID := metricID
+       if metricID == math.MaxUint16 {
+           return werror.Wrap(errIntegerOverflow)
+       }

        b.ib.Append(ID)
        metricID++
lquerel commented 2 months ago

Migrating to a uint32 will have a significant impact on memory consumption and a slight effect on the compression rate. Ideally, it would be best to provide an option at compile time for scenarios requiring more than 2^16 entries per batch, but this would likely involve a major refactoring. If we receive many requests for this in the future, we could work on it, but for now, the best approach IMO is probably to generate a more informative message. Something like, "uint16 overflow detected, please reduce your batch size to stay within this limit."