nlesc-sherlock / emma

Ansible playbook to create a cluster with GlusterFS, Docker, Spark and JupyterHub services
Apache License 2.0
3 stars 4 forks source link

Pdal role lost connection to remote host during pdal build #139

Closed meiertgrootes closed 4 years ago

meiertgrootes commented 4 years ago

When building pdal from source within the pdal role, having specified the python3 executable as in #138 Ansible reports failure after 5 minutes, saying that the remote host has closed ssh connection. Logging into remote host, which pdal does find installed pdal.

meiertgrootes commented 4 years ago

just to draw your attention @fdiblen @sverhoeven

meiertgrootes commented 4 years ago

Update: @fdiblen @sverhoeven I ran the pdal role again, but stopping after running intial CMake. Logged into node nd checked make version (which make -> 4.1). Ran remaining steps manually without issues, resulting in pdal installation. then retried with new cluster but including the sudo make -j2 command. Again lost connection, with message that connection closed by host. logged in again and ran sudo make install , which executed without issues. Can this be terraform/ansible timing out?

meiertgrootes commented 4 years ago

This is indeed the connection timing out while the build is on-going. Fixed by adding keep alive arguments to ansible config. will be included in upcoming pull request