seznam / euphoria

Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Apache License 2.0
82 stars 11 forks source link

euphoria-flink: caching states #67

Closed xitep closed 7 years ago

xitep commented 7 years ago

AbstractWindowOperator which is part of the Euphoria's RSBK implementation for the streaming flink executor caches once retrieved states per key/window in memory. While this seemed a good idea initially, it turns out to be problematic when a lot of different keys with lot of different windows are open - this is easily achievable with time-sliding for example. Further, it turns out that caching the state objects doesn't bring much!

xitep commented 7 years ago

When not caching state instances, it appears worth while to avoid frequent translation of euphoria storage descriptors into flink state descriptors. As part of the initialization of the state descriptors and the state storage objects, flink's type inference and serializer initialization is at work, adding constant overhead per element/window/state to a euphoria flow, which on an equivalent flink program is present only once at the start of the program.

I've pushed some experiments into the brach pete/experimental-state-descriptors.

xitep commented 7 years ago

This was resolved by #69