look at @FMNSSun's parallelization of pto3-trace, and generalize it into a ParallelScanningNormalizer.
The main thing that needs to change in NormFunc is the output metadata accumulator: this can no longer be a single map[string]interface{} that the function can edit. Two design possibilities here:
each parallel goroutine has its own output metadata accumulator, and the finalizer takes an array of these
a single goroutine (a function provided by the normalizer) manages a central output metadata accumulator. normalizer functions write metadata updates to a channel read by this accumulator.
Option 2 is simpler, but it implies that a normalizer function's action must be independent of output metadata accumulated from previous rows. This is probably fine: if you really want deterministic behavior in output metadata accumulation, you need to do everything serial anyway, and you should use ScanningNormalizer (which we should rename SerialScanningNormalizer) instead.
look at @FMNSSun's parallelization of pto3-trace, and generalize it into a
ParallelScanningNormalizer
.The main thing that needs to change in NormFunc is the output metadata accumulator: this can no longer be a single map[string]interface{} that the function can edit. Two design possibilities here:
each parallel goroutine has its own output metadata accumulator, and the finalizer takes an array of these
a single goroutine (a function provided by the normalizer) manages a central output metadata accumulator. normalizer functions write metadata updates to a channel read by this accumulator.
Option 2 is simpler, but it implies that a normalizer function's action must be independent of output metadata accumulated from previous rows. This is probably fine: if you really want deterministic behavior in output metadata accumulation, you need to do everything serial anyway, and you should use
ScanningNormalizer
(which we should renameSerialScanningNormalizer
) instead.