Open jakubadamek opened 4 months ago
Yes this is a known issue but we prefer not to increase the default memory significantly. To see the reasoning, please take a look at where JAVA_OPTS
is set here and the comments above it. So I am guessing that you are running this on a machine with a lot of cores (how many?). We cannot change JVM memory configuration in the controller
code but we can check in advance and fail early if we know that the big number of threads is going to fail the pipeline later on.
Another option is to limit the number of threads dynamically, i.e., even if numThreads
is set to use all cores (e.g., here) we can override it and set it to a smaller number where the provided memory is enough. That way the pipeline won't fail but may suffer performance-wise; @chandrashekar-s WDYT about this capping change?
Yes this is a known issue but we prefer not to increase the default memory significantly. To see the reasoning, please take a look at where
JAVA_OPTS
is set here and the comments above it. So I am guessing that you are running this on a machine with a lot of cores (how many?). We cannot change JVM memory configuration in thecontroller
code but we can check in advance and fail early if we know that the big number of threads is going to fail the pipeline later on.Another option is to limit the number of threads dynamically, i.e., even if
numThreads
is set to use all cores (e.g., here) we can override it and set it to a smaller number where the provided memory is enough. That way the pipeline won't fail but may suffer performance-wise; @chandrashekar-s WDYT about this capping change?
This is a good idea to dynamically set the number of cores, if the configured memory is insufficient and when the numThreads
is set to use all cores (or say optimum value). This way we don't fail the application, but we should also warn the user about the same in the logs.
I was testing a simple developer setup for pipelines:
This finished with some errors which I ignored: https://paste.googleplex.com/5983969268465664
docker-compose -f docker/compose-controller-spark-sql-single.yaml up --force-recreate
Finished with error due to JVM max memory: https://paste.googleplex.com/4827520718864384
Fixed by editing docker/.env JAVA_OPTS=-Xms10g -Xmx10g
Should we make this change permanent?