Open mbookman opened 8 years ago
this doesn't quite match my environment. my error is the same:
error: commlib error: access denied (client IP resolved to host name "master001". This is not identical to clients host name "samtools-index-master001.c.isb-cgc.internal")
Unable to run job: unable to send message to qmaster using port 6444 on host "master001": got send error.
but my /etc/hostname file doesn't look like it's been modified, it contains only 'master001'
my fix, suggested by a website was to create a file named /var/lib/gridengine/default/common/host_aliases
and add the line:
master001 samtools-index-master001.c.isb-cgc.internal
Thanks Michael.
The file that is getting updated is /etc/hosts
, not /etc/hostname
.
The update to the host_aliases
file works? That's fantastic.
ah, yes, there it is indeed. your fix might be more straight-forward and general, although what i found should be good for gridengine:
# THIS FILE IS CONTROLLED BY ANSIBLE
# any local modifications will be overwritten!
#
# This file is managed by Ansible.
127.0.0.1 localhost.localdomain localhost
10.240.0.20 compute001
10.240.0.12 compute002
10.240.0.21 compute003
10.240.0.57 compute004
10.240.0.13 compute005
10.240.0.49 master001
10.240.0.49 samtools-index-master001.c.isb-cgc.internal samtools-index-master001 # Added by Google
Hi Matt,
If you have sudo
access, it probably would be just easier to update the Ansible Playbook.
~p
I just recently started seeing the following after my cluster has been running for some time. When issuing commands like
qstat
:I don't know yet why this started occurring, but I have traced it to the DHCP client on Compute Engine and an associated "set-hostname" hook.
These are actually the same file:
The DHCP client will episodically call the set-hostname script, which updates /etc/hosts with a fully qualified version of the hostname and then calls
hostname
to change the hostname.I have worked around this (and tracked it specifically to the dhcp client) by editing the set-hostname script and adding:
before any code executes, namely before:
This prevents the problem from occurring.
To "fix" a running instance:
# Added by Google
record from /etc/hostssudo hostname frontend001
Need to find out what the right way is to prevent this from happening. The /tmp/set-hostname.txt log shows: