Closed lottspot closed 5 years ago
The settings in /etc/hosts, /etc/host.conf, /etc/gai.conf, and /etc/hostname will affect how that grain comes out. I have seen this problem before with our setup. Inconsistent entries in /etc/hosts caused our issue. Having a file.managed for /etc/hosts might be one way to remove the edge cases.
The settings in /etc/hosts, /etc/host.conf, /etc/gai.conf, and /etc/hostname will affect how that grain comes out.
The settings in gai.conf won't actually have an effect since get_fqhostname
will never return the results of its call to socket.getaddrinfo
(unless I'm misreading the function, which is certainly possible).
Managing the /etc/hosts file in order to get the grains.fqdn
grain to agree with the output of hostname --fqdn
is extremely unintuitive at best, and, depending on the system, my be infeasible at worst.
The problematic behavior of python's socket.getfqdn
has been discussed and agreed upon, though it seems that a patch never made it into python 2.7 (not sure what the behavior in 3.4 is). It seems that the consensus is The Right Thing to do is use the canonname returned by a call to getaddrinfo(3)
in order to determine the fully qualified name of a machine.
I was running a strace -f -e open python2.7
the then import socket
then typed socket.fqdn()
which shows:
>>> socket.getfqdn()
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 3
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 3
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib/x86_64-linux-gnu/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
open("/etc/gai.conf", O_RDONLY|O_CLOEXEC) = 3
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 3
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
In the grains core module fqdn
is set with the following python code:
if __FQDN__ is None:
__FQDN__ = salt.utils.network.get_fqhostname()
grains['fqdn'] = __FQDN__
in salt/utils/network.py salt.utils.network.get_fqhostname is as follows:
def get_fqhostname():
'''
Returns the fully qualified hostname
'''
l = []
l.append(socket.getfqdn())
# try socket.getaddrinfo
try:
addrinfo = socket.getaddrinfo(
socket.gethostname(), 0, socket.AF_UNSPEC, socket.SOCK_STREAM,
socket.SOL_TCP, socket.AI_CANONNAME
)
for info in addrinfo:
# info struct [family, socktype, proto, canonname, sockaddr]
if len(info) >= 4:
l.append(info[3])
except socket.gaierror:
pass
l = _sort_hostnames(l)
if len(l) > 0:
return l[0]
return None
So it looks like it uses a socket.fqdn() then does a try
socket.getaddrinfo(). I wrote a simple test program as:
#!/usr/bin/python2.7
import socket
if __name__ == '__main__':
print("socket.getfqdn(): {}".format(socket.getfqdn()))
try:
addrinfo = socket.getaddrinfo(
socket.gethostname(), 0, socket.AF_UNSPEC, socket.SOCK_STREAM,
socket.SOL_TCP, socket.AI_CANONNAME
)
for info in addrinfo:
if len(info) >= 4:
print("found info:{}, info[3]:{}".format(info, info[3]))
except socket.gaierror:
print("socket.gaierror error")
then edited /etc/hosts and can see the results change based on whats in there.
There is also the call to _sort_hostnames
does some sorting base on IPv6 or localhost names that might cause you grief if you use IPv6 with an address that starts with fe00
or fe02
.
I took a look at the man for getaddrinfo() which states:
If hints.ai_flags includes the AI_CANONNAME flag, then the ai_canonname field of the first of the addrinfo structures in the returned list is set to point to the official name of the host.
It seems that /etc/hosts is going play a part in this grain for now. It is possible that the call to socket.getaddrinfo() is failing and since there is only a pass
in there it will be silently failing. Maybe we should add a log message to the function?
Could the following be your issue? Do we want to add only if canonname is set?
if len(info) >= 4:
l.append(info[3])
Might need to be:
if len(info) >= 4 and info[3]:
l.append(info[3])
Because sometimes info[3]
is a empty string and it gets added to the list of possible hostnames used in fqdn.
We certainly don't disagree on the code path that sets the grain. In my OP, I linked to the exact same module you just directed me to. It's our understanding of the behavior of get_fqhostname
where we seem to be diverging.
Firstly, the sorting method you're referring to is no longer used in current stable. Secondly, while the get_fqhostname
function does make a call to socket.getaddrinfo
, the return values from its call to socket.getaddrinfo
will never be returned from get_fqhostname
, because the return value of socket.getfqdn
is always appended as element 0 in list l
and the return value will always be either l
's element 0 or None
. Consequently, the entire code path which calls socket.getaddrinfo
is essentially a dead path.
This could easily be fixed by simply moving the call to socket.fqdn
after the call to socket.getaddrinfo
, like:
def get_fqhostname():
'''
Returns the fully qualified hostname
'''
l = []
# try socket.getaddrinfo
try:
addrinfo = socket.getaddrinfo(
socket.gethostname(), 0, socket.AF_UNSPEC, socket.SOCK_STREAM,
socket.SOL_TCP, socket.AI_CANONNAME
)
for info in addrinfo:
# info struct [family, socktype, proto, canonname, sockaddr]
if len(info) >= 4:
l.append(info[3])
except socket.gaierror:
pass
l.append(socket.getfqdn())
return l and l[0] or None
This would give preference to the canonname returned by socket.getaddrinfo
and fall back on socket.getfqdn
where a call to socket.getaddrinfo
fails.
I was running a strace -f -e open python2.7 the then import socket then typed socket.fqdn() which shows:
I stand corrected! Though your experiment shows that gai.conf apparently plays some role, the nature of that role is still painfully opaque, which goes back to my point about the entire behavior of socket.fqdn
being woefully unreliable, which has the impact of limiting the usefulness of grains.fqdn
.
The fact of the matter is though that socket.fqdn
is internally making a call to socket.gethostbyaddr
, which in turn wraps the gethostbyaddr
system call, which is itself documented as being obsolete, and recommends instead using getaddrinfo
.. Beyond the obsolescence though, the fact that it fundamentally relies on the gethostbyaddr
system call means that the fully qualified name returned by socket.getfqdn
is simply unreliable. An action as simple as adding IP addresses to a machine can cause its fqdn to change.
I'm certainly willing to write up a PR, but would rather know whether there's interest in changing this behavior before investing time in doing so
ping @cachedout can I get your input on this issue? Not sure which is the best route to ensure these edge cases are also taken care of. Thanks!
I would definitely be interested in seeing a PR that changes this behaviour. I have been trying to figure out why FQDNs are broken on salt on linux. They do not return the same output as hostname -f
, rather localhost.domainname
where domainname is correct, but localhost should be hostname -s
. If I remove the localhost lines from /etc/hosts, this resolves the issue, however on puppet, chef and ansible, this is not needed, and the FQDN returns the output of hostname -f
, as expected.
Just caught an issue when two different salt versions on two hosts with the same OS (Ubuntu 14.04) and DNS/hosts/hostnames setup provide different fqdn
grain:
Expected one:
$ hostname -f
lb-a9723eb4.example.com
$ sudo salt-call grains.get fqdn
local:
lb-a9723eb4.example.com
$ sudo salt-call --versions-report
Salt Version:
Salt: 2016.3.4
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.4.2
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.8
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 0.9.1
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 14.0.1
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
System Versions:
dist: Ubuntu 14.04 trusty
machine: x86_64
release: 3.13.0-48-generic
system: Linux
version: Ubuntu 14.04 trusty
Unexpected (broken) one:
$ hostname -f
frontend-025da143.example.com
$ sudo salt-call grains.get fqdn
local:
frontend-025da143
$ sudo salt-call --versions-report
Salt Version:
Salt: 2016.11.1
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 2.4.2
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.8
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 0.9.1
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 14.0.1
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
System Versions:
dist: Ubuntu 14.04 trusty
machine: x86_64
release: 3.13.0-48-generic
system: Linux
version: Ubuntu 14.04 trusty
This is really weird and unreliable behavior. In our case it breaks mail delivery due to shortened hostnames/destinations in postfix configuration.
We're also observing wrong fqdn
on a number of other systems, including Ubuntu 16.04 running salt 2016.11.1, where hostname -f
provides correct information.
I vote for an option when fqdn
grain will match to output of hostname -f
.
This appears to be an upstream bug. This grain call's python's socket.getfqdn(), which in turn reportedly calls the C++ gethostbyname which, in this case, will just return the value passed to it. If the value passed isn't the correct FQDN, you get the wrong answer - garbage in garbage out. The parameter passed by python is kernelhostname.
Several people have reported that putting the fqdn in your /etc/hosts file next to 127.0.0.1 resolves the issue, however I found that I needed to re-set the kernel.hostname entry on my Red Hat based box with sysctl kernel.hostname=hostname.example.com so kernelhostname appears this is populated by systemctl on those systems and probably by /etc/hosts on debian and other-based systems.
@jdshewey has this exactly right. I've looked at this before and come to the same conclusion.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
I think this still can be an issue, in certain conditions unexpected value can be returned.
Salt Version:
Salt: 2019.2.5
Dependency Versions:
cffi: Not Installed
cherrypy: 3.5.0
dateutil: 2.5.3
docker-py: Not Installed
gitdb: 2.0.0
gitpython: 2.1.1
ioflo: Not Installed
Jinja2: 2.9.4
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: 0.24.0
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.8
mysql-python: 1.3.7
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.13 (default, Sep 26 2018, 18:42:22)
python-gnupg: Not Installed
PyYAML: 3.12
PyZMQ: 16.0.2
RAET: Not Installed
smmap: 2.0.1
timelib: Not Installed
Tornado: 4.4.3
ZMQ: 4.2.1
System Versions:
dist: debian 9.12
locale: UTF-8
machine: x86_64
release: 4.9.0-12-amd64
system: Linux
version: debian 9.12
Correct / expected output by hostname -f
$ hostname -f
myhostname.my.tld
Unexpected data in grains
$ salt-call grains.get fqdn
local:
localhost
Cause of the issue:
1) socket.getfqdn()
used as main method to get FQDN
https://github.com/saltstack/salt/blob/master/salt/utils/network.py#L240
https://github.com/saltstack/salt/blob/dc0595cc811f541044efe89446d5f212968db7e3/salt/utils/network.py#L240-L263
2) Unexpected / incorrect line in /etc/hosts
127.0.0.1 localhost
127.0.1.1 myhostname.my.tld myhostname
# this cause problem
::1 myhostname
In develop
branch socket.getaddrinfo()
used and the problem does not appear even with not fully correct /etc/hosts
https://github.com/saltstack/salt/blob/develop/salt/utils/network.py#L199
https://github.com/saltstack/salt/blob/637fe0b04f38b2274191b005d73b3c6707d7f400/salt/utils/network.py#L199-L223
Basic Python test.
± python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.getfqdn()
'localhost'
>>> socket.getaddrinfo(socket.gethostname(), 0, socket.AF_UNSPEC, socket.SOCK_STREAM, socket.SOL_TCP, socket.AI_CANONNAME)
[(10, 1, 6, 'myhostname.my.tld', ('::1', 0, 0, 0)), (2, 1, 6, '', ('127.0.1.1', 0))]
I think this still can be an issue, in certain conditions unexpected value can be returned.
Salt Version: Salt: 2019.2.5 Dependency Versions: cffi: Not Installed cherrypy: 3.5.0 dateutil: 2.5.3 docker-py: Not Installed gitdb: 2.0.0 gitpython: 2.1.1 ioflo: Not Installed Jinja2: 2.9.4 libgit2: Not Installed libnacl: Not Installed M2Crypto: 0.24.0 Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.4.8 mysql-python: 1.3.7 pycparser: Not Installed pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: Not Installed Python: 2.7.13 (default, Sep 26 2018, 18:42:22) python-gnupg: Not Installed PyYAML: 3.12 PyZMQ: 16.0.2 RAET: Not Installed smmap: 2.0.1 timelib: Not Installed Tornado: 4.4.3 ZMQ: 4.2.1 System Versions: dist: debian 9.12 locale: UTF-8 machine: x86_64 release: 4.9.0-12-amd64 system: Linux version: debian 9.12
Correct / expected output by
hostname -f
$ hostname -f myhostname.my.tld
Unexpected data in grains
$ salt-call grains.get fqdn local: localhost
Cause of the issue:
socket.getfqdn()
used as main method to get FQDN https://github.com/saltstack/salt/blob/master/salt/utils/network.py#L240 https://github.com/saltstack/salt/blob/dc0595cc811f541044efe89446d5f212968db7e3/salt/utils/network.py#L240-L263- Unexpected / incorrect line in
/etc/hosts
127.0.0.1 localhost 127.0.1.1 myhostname.my.tld myhostname # this cause problem ::1 myhostname
In
develop
branchsocket.getaddrinfo()
used and the problem does not appear even with not fully correct/etc/hosts
https://github.com/saltstack/salt/blob/develop/salt/utils/network.py#L199Basic Python test.
± python Python 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket >>> socket.getfqdn() 'localhost' >>> socket.getaddrinfo(socket.gethostname(), 0, socket.AF_UNSPEC, socket.SOCK_STREAM, socket.SOL_TCP, socket.AI_CANONNAME) [(10, 1, 6, 'myhostname.my.tld', ('::1', 0, 0, 0)), (2, 1, 6, '', ('127.0.1.1', 0))]
@hatifnatt Did you find a resolution for this? Am running in to the same thing.
@ashwin-subramanian I don't remember how exactly I have fixed this issue, it was about 2 years ago... Probably I have corrected my /etc/hosts
file. I have not encountered this problem anymore since then.
It seems that there may be some edge cases where
grains.fqdn
does not produce the same value as, say,hostname -f
. This makes the fqdn grain difficult to predict the value of or rely on.After investigating the issue further, this appears due to the internal use of
socket.fqdn
to determine the fqdn grain. It seems that this issue has been raised before and was fixed, but the fix was later undone.Though the current implementation of
get_fqhostname
does try to usesocket.getaddrinfo
, there appears to be no code path through which the canonname value returned by getaddrinfo can then be returned to the caller of get_fqhostname.I'm not sure whether the intent is to only use
socket.getfqdn
or to fall back on it only whensocket.getaddrinfo
fails, but it seems like something is off in either case. From the standpoint of an administrator, it would be more intuitive ifget_fqhostname
gave preference to the fqdn as returned in the canonname value fromsocket.getaddrinfo
.