mesosphere / spark-build

Used to build the mesosphere/spark docker image and the DC/OS Spark package
52 stars 34 forks source link

[DCOS-55536] Fix Spark scheduler's LIBPROCESS TLS config #533

Closed tillt closed 5 years ago

tillt commented 5 years ago

What changes were proposed in this pull request?

Resolves https://jira.mesosphere.com/browse/DCOS-55536.

How were these changes tested?

Not at all so far.

Release Notes

Fixed bug in SPARK scheduler startup environment.

jgehrcke commented 5 years ago

It really helps us with release notes and a general understanding of the changes in the repo

I love this sentiment and updated the PR title.

tillt commented 5 years ago

Thanks so much for this ultra quick turnaround - much appreciated. Also thanks for fixing my sloppy tittle!

tillt commented 5 years ago

Quoting a bit more reasoning...

So, those variables determine the locations of the SSL certificate bundle. These locations are being used by OpenSSL via libprocess while establishing connections if and only if we told libprocess to verify peer certificates. So far, that was not enabled in DC/OS, that changes now. When the locations are set to invalid locations, only an enabled LIBPROCESS_SSL_VERIFY_CERT will make that bug surface. More on those new changes; https://jira.mesosphere.com/browse/DCOS-54044 Or even directly here; https://github.com/mesosphere/dcos-enterprise/pull/6012/files#diff-8acc8e77a15678f67a353d405fcf971cR521 That is the PR we are using for our tests right now.

The locations should pretty much never be set by schedulers or other Mesos workload. The Mesos agent itself will prepare a valid environment for them via SSL-exec isolator (UCR) or SSL-exec hook (docker) modules. More on that here: https://github.com/mesosphere/dcos-ee-mesos-modules/tree/master/ssl_exec

Question remaining; why does the test in question only fail for the docker containerizer but not the Mesos containerizer? Looking into that...

jgehrcke commented 5 years ago

@akirillov can we merge this?