open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.08k stars 1.36k forks source link

[pdata] Add ForEach methods to reduce deeply nested looping #8927

Open djaglowski opened 7 months ago

djaglowski commented 7 months ago

Is your feature request related to a problem? Please describe. Drilling through pdata structs in order to perform work on each items is very common but generally requires deeply nested and repetitive code.

func doSomethingWithLogs(ld plog.Logs) {
    rls := ld.ResourceLogs()
    for i := 0; i < rls.Len(); i++ {
        rl := rls.At(i)
        sls := rl.ScopeLogs()
        for j := 0; j < sls.Len(); j++ {
            sl := sls.At(j)
            for k := 0; k < sl.LogRecords().Len(); k++ {
                lr := sl.LogRecords().At(k)
                // do something with rl, sl, lr
            }
        }
    }
}

Describe the solution you'd like We should consider adding ForEach* methods at various levels in the pdata hierarchy.

The same code above could look something like:

func doSomethingWithLogs(ld plog.Logs) {
    ld.ForEachLogRecord(func(rl plog.ResourceLogs, sl plog.ScopeLogs, lr plog.LogRecord) {
        // do something with rl, sl, lr
    })
}

This could be implemented at the level of each "Slice" struct. e.g.

func (es ResourceLogsSlice) ForEachResourceLogs(func(rl plog.ResourceLogs))
func (es ResourceLogsSlice) ForEachScopeLogs(func(rl plog.ResourceLogs, sl plog.ScopeLogs))
func (es ResourceLogsSlice) ForEachLogRecord(func(rl plog.ResourceLogs, sl plog.ScopeLogs, lr plog.LogRecord))

func (es ScopeLogsSlice) ForEachScopeLogs(func(sl plog.ScopeLogs))
func (es ScopeLogsSlice) ForEachLogRecord(func(sl plog.ScopeLogs, lr plog.LogRecord))

func (es LogRecordSlice) ForEachLogRecord(func(lr plog.LogRecord))

Optionally, we could go one step further and expose these on the structs which contain Slices. e.g.

func (ms Logs) ForEachResourceLogs(func(rl plog.ResourceLogs))
func (ms Logs) ForEachScopeLogs(func(rl plog.ResourceLogs, sl plog.ScopeLogs))
func (ms Logs) ForEachLogRecord(func(rl plog.ResourceLogs, sl plog.ScopeLogs, lr plog.LogRecord))

func (ms ScopeLogs) ForEachScopeLogs(func(sl plog.ScopeLogs))
func (ms ScopeLogs) ForEachLogRecord(func(sl plog.ScopeLogs, lr plog.LogRecord))

Describe alternatives you've considered As I understand it, the current pattern of requiring iteration using Len and At was intentional based on slight advantage in performance. I believe this level of optimization is helpful in some cases and should remain on the table as an option. However, I believe there is a tradeoff with component writability and maintainability which should be considered here as well.

Additional context I'm not certain this would work as nicely for metric data points, since we have multiple types. Perhaps someone has ideas or opinions on this aspect in particular. e.g.

func (es ResourceMetricsSlice) ForEachNumberDataPoint(rm pmetric.ResourceMetrics, sm pmetric.ScopeMetrics, m pmetric.Metric, dp pmetric.NumberDataPoint)
djaglowski commented 7 months ago

cc @mx-psi @bogdandrutu

bogdandrutu commented 7 months ago

Personal preference:

  1. Add for in every Slice in the pdata generator:
func (es ResourceLogsSlice) ForEach(func(rl plog.ResourceLogs))

func (es ScopeLogsSlice) ForEach(func(sl plog.ScopeLogs))

func (es LogRecordSlice) ForEach(func(lr plog.LogRecord))
  1. Add high-level ForEach in top level structs:

    func (ms Logs) ForEachLogRecord(func(rl plog.ResourceLogs, sl plog.ScopeLogs, lr plog.LogRecord))
  2. Wait for more feedback to come.

mx-psi commented 7 months ago

I think we can clearly do this in an additive way, so this is not required for pdata 1.0. I like @bogdandrutu's plan (but maybe we don't need all ForEachs, I suspect for the most part components will benefit from ForEachLogRecord and similar methods)