Closed chandrashekar-s closed 5 months ago
The maxParallelism
in the FlinkPipelineOptions
over here, is used to control the number of maximum number of tasks to which keyed state can be distributed, i.e, the maximum effective parallelism of an operator and this is applicable only in a Streaming environment. In our case since we don't have any pipelines using streams this parameter is not used anywhere. So the maxWorkers
param will be removed.
The
maxParallelism
which is set by themaxWorkers
over here, controls the maximum number of tasks to which keyed state can be distributed, i.e it is something like distributing the incoming streamed data over N keyed partitions. Few details can be found here. This looks like it will be used in a Streaming environment and this should not affect our pipelines as we don't have any streams.Also, commenting on the number of task managers it is set to 1 by default in a local environment, can be found here. Increasing it in a local environment might not help much. However, in a clustered environment this gets dynamically calculated based on the parallelism set for operators in tasks (not the maxParallelism) and the number of taskSlots in a single TaskManager defined by this parameter.
Investigate further for a clustered environment on how the number of
Task Managers
are created dynamically and how is it dependent on thetaskmanager.memory.network.max
parameters.