prestodb / presto-admin

A tool to install, configure and manage Presto installations
http://prestodb.github.io/presto-admin/
Apache License 2.0
171 stars 100 forks source link

Wrong error message during install when missing .properties files #212

Closed zz22394 closed 8 years ago

zz22394 commented 8 years ago

Dear team We are trying to use presto-admin to install Presto.

before installing, I forgot to create node.properties, in this situation:

[sso@ambari-sso-3 /etc/opt/prestoadmin/coordinator]$ ll /etc/opt/prestoadmin/coordinator/
total 16
-rw-r--r-- 1 root root 232 Jul 26 14:33 config.properties
-rw-r--r-- 1 root root  30 Jul 26 14:33 env.sh
-rw-r--r-- 1 root root 153 Jul 26 14:33 jvm.config
[sso@ambari-sso-3 /etc/opt/prestoadmin/coordinator]$

then I installed:

/opt/presto/prestoadmin/presto-admin server install /home/sso/tools/presto/presto_server_pkg.141t/presto-server-rpm-0.141t-1.x86_64.rpm -I

and got the Error message:

Fatal error: [hive-sso-31]

Underlying exception:
    Permission denied

Aborting.

Log in presto-admin.log:

2016-07-26 15:01:48,613|21147|140188055705344|prestoadmin.fabric_patches|ERROR|Traceback (most recent call last):
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/fabric_patches.py", line 138, in inner
    submit(task.run(*args, **kwargs))
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/server.py", line 120, in deploy_install_configure
    update_configs()
...
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/util/base_config.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/configure_cmds.py", line 71, in deploy
    prestoadmin.deploy.coordinator()
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/deploy.py", line 43, in coordinator
    configure_presto(coord.Coordinator().get_conf(),
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/node.py", line 47, in get_conf
    config.write_conf_to_file(conf_value, file_path)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/config.py", line 80, in write_conf_to_file
    write_properties_file(conf, path)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/config.py", line 89, in write_properties_file
    write(output, path)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/config.py", line 107, in write
    with open(path, 'w') as f:
IOError: [Errno 13] Permission denied: '/etc/opt/prestoadmin/coordinator/node.properties'

In fact, node.properties is not "Permission denied", it is not exist.

ebd2 commented 8 years ago

Hi Shin, I took a look at the stack trace and the code. presto-admin will create node.properties if it does not exist, which is what it's doing in this case. The IOError suggests one of two possible causes:

  1. presto-admin hasn't been run with sudo (or as root directly)
  2. The permissions on the /etc/opt/prestoadmin/coordinator directory are not set to allow root to create a new file in that location.

Can you please verify the permissions and/or try running with sudo?

Thanks,

Eric

zz22394 commented 8 years ago

Eric: Thanks for your reply. The reason that presto-admin hasn't been run with sudo is;.

  1. Servers in our company can't be login with root account. they can only login with user account and then use sudo. I tried to run "sudo ./presto-admin....." as document written, but got error messages, and couldn't create presto cluster with sudo. In the opposite, we created presto cluster successfully without sudo.

I guess that after I use "sudo ./presto-admin" , then the presto-admin process will access presto server as root, which is not allowed by our server setting and failed. When I use "./presto-admin -I ...." without sudo, the presto-admin process will run in my account, and then connect presto server with the same account, after that it can install presto with sudo password . Is that right?

zz22394 commented 8 years ago

When I use sudo, I got the following errors:

[sso@ambari-sso-3 ~]$ sudo /opt/presto/prestoadmin/presto-admin server restart -A -I
Initial value for env.password:

Fatal error: [hive-sso-31] Needed to prompt for a connection or sudo password (host: hive-sso-31), but input would be ambiguous in parallel mode

Aborting.

Fatal error: [hive-sso-32] Needed to prompt for a connection or sudo password (host: hive-sso-32), but input would be ambiguous in parallel mode

Aborting.

Error log:

SystemExit: [hive-sso-34] Needed to prompt for a connection or sudo password (host: hive-sso-34), but input would be ambiguous in parallel mode

2016-07-27 10:07:29,446|32063|140310930949888|prestoadmin.fabric_patches|ERROR|Traceback (most recent call last):
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/fabric_patches.py", line 138, in inner
    submit(task.run(*args, **kwargs))
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/util/base_config.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/server.py", line 307, in restart
    if stop_and_start():
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/server.py", line 288, in stop_and_start
    if check_presto_version() != '':
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/server.py", line 318, in check_presto_version
    if not presto_installed():
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/server.py", line 328, in presto_installed
    package_search = run('rpm -q presto')
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/network.py", line 647, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/fabric_patches.py", line 74, in run
    timeout=timeout, shell_escape=shell_escape)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/network.py", line 647, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/operations.py", line 1054, in run
    shell_escape=shell_escape)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/operations.py", line 921, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/state.py", line 397, in default_channel
    chan = _open_session()
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/state.py", line 389, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/network.py", line 159, in __getitem__
    self.connect(key)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/network.py", line 151, in connect
    user, host, port, cache=self, seek_gateway=seek_gateway)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/network.py", line 531, in connect
    password = prompt_for_password(text)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/network.py", line 604, in prompt_for_password
    handle_prompt_abort("a connection or sudo password")
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/utils.py", line 174, in handle_prompt_abort
    abort(reason % "input would be ambiguous in parallel mode")
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/prestoadmin/fabric_patches.py", line 59, in abort
    old_abort(msg)
  File "/home/sso/tools/prestoadmin/presto-admin-install/lib/python2.6/site-packages/fabric/utils.py", line 53, in abort
    sys.exit(msg)
SystemExit: [hive-sso-35] Needed to prompt for a connection or sudo password (host: hive-sso-35), but input would be ambiguous in parallel mode

2016-07-27 10:07:29,475|32058|140310930949888|__main__|DEBUG|Exiting normally
cawallin commented 8 years ago

Sorry for the delay. The issues here is in connecting to the other nodes of the Presto, not starting Presto yet. What is in your /etc/opt/prestoadmin/config.json file? Maybe you have root there as "username", and it's trying to log onto the other nodes as root and failing. I suggest running presto-admin with the --serial flag (http://teradata.github.io/presto/docs/current/installation/presto-admin/presto-admin-cli-options.html), that might give you more clarity about what exactly is failing.

Once you get to the other nodes, the Presto server is always started as the presto service user.

cawallin commented 8 years ago

Also, the password you provide has to be both the password you use to log into the cluster and the sudo password for that user. The stacktrace you provided suggests that, with the password you're providing, the "username" user specified in config.json can't log into the other node.

zz22394 commented 8 years ago

@cawallin In config.json, it is not root.

[sso@ambari-sso-1 /etc/opt/prestoadmin]$ cat config.json
{
    "username": "sso",
    "port": "22",
    "coordinator": "hive-sso-11.qe",
    "workers": ["hive-sso-12.qe","hive-sso-13.qe" ,"hive-sso-14.qe" ,"hive-sso-15.qe"]
}

[sso@ambari-sso-1 /etc/opt/prestoadmin]$

with the password you're providing, the "username" user specified in config.json can't log into the other node. We don't use password to login, we use private key. because public key had already been placed in remote presto server.

We have two limitation with network: 1)nobody is allowed to login with root directly. Only user-level account log-in is allowed, then sudo will be OK. 2)public key is placed in the remote server , so we have to use private key or agent-forward.

Do you think these limitations have anything relations with the errors?

cawallin commented 8 years ago

1) is fine -- you have it set up such that you login as sso. But maybe your network doesn't allow you to ssh as root even if you're logging in as sso. Can you try manually running "sudo ssh sso@host"?

cawallin commented 8 years ago

If that doesn't work, you can work around the issue by making /etc/opt/prestoadmin and /var/log/prestoadmin and /etc/prestoadmin readable by sso, and then your config will write out properly.

We have plans to make it not necessary to run presto-admin as root for these issues and a couple others, hopefully we'll be able to get to it soon.

zz22394 commented 8 years ago
[sso@ambari-sso-1 /etc/opt/prestoadmin]$ sudo ssh sso@hive-sso-11.qe
[sudo] password for sso:
The authenticity of host 'hive-sso-11.qe (172.21.)' can't be established.
DSA key fingerprint is ae:4f:b4:c6:14:68:73:fa:82:50:78:08:eb:f4:c3:b1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hive-sso-11.qe,172.21.' (DSA) to the list of known hosts.
Permission denied (publickey).
[sso@ambari-sso-1 /etc/opt/prestoadmin]$

maybe no private key in root account, because the private key is put in my ~/.ssh/ folder.

cawallin commented 8 years ago

You were specifying just a password before -- can you log on with just a password or do you need an ssh key? You can specify -i to use your private ssh key.

presto-admin also has a -i flag to specify your private ssh key.

zz22394 commented 8 years ago

@cawallin ssh works.

[sso@ambari-sso-1 /etc/opt/prestoadmin]$ sudo ssh sso@hive-sso-11.qe -i /home/sso/.ssh/id_rsa
Enter passphrase for key '/home/sso/.ssh/id_rsa':
Last login: Wed Jul 27 23:17:29 2016 from ambari-sso-1.qe
[sso@hive-sso-11 ~]$

however, presto-admin still got error.

[sso@ambari-sso-1 /etc/opt/prestoadmin]$ sudo /opt/presto/prestoadmin/presto-admin configuration show -i /home/sso/.ssh/id_rsa -I
Initial value for env.password:

Fatal error: [hive-sso-11.qe] get() encountered an exception while downloading '/etc/presto/node.properties'

Underlying exception:
    Permission denied

Aborting.

I didn't asked to input the passphrase of my private key.

cawallin commented 8 years ago

This is progress, presto-admin can access the other nodes now! Does /etc/presto/node.properties exist on that node? Your initial issue was writing out the config files to all the nodes, so maybe it didn't get there. Otherwise a log for presto-admin would be helpful.

zz22394 commented 8 years ago
[sso@ambari-sso-1 ~]$ sudo /opt/presto/prestoadmin/presto-admin configuration show -i /home/sso/.ssh/id_rsa -I
Initial value for env.password:

Fatal error: [hive-sso-11.qe] Needed to prompt for a connection or sudo password (host: hive-sso-11.qe), but input would be ambiguous in parallel mode

Aborting.

Error log:

2016-07-27 23:53:36,397|7533|140464355276544|paramiko.transport|DEBUG|Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
2016-07-27 23:53:36,397|7533|140464355276544|paramiko.transport|DEBUG|using kex diffie-hellman-group14-sha1; server key type ssh-dss; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2016-07-27 23:53:36,640|7533|140464355276544|paramiko.transport|DEBUG|Switch to new keys ...
2016-07-27 23:53:36,642|7533|140464677693184|paramiko.transport|DEBUG|Adding ssh-dss host key for hive-sso-15.qe: 21829032f46096bf9fda49dc707ccb28
2016-07-27 23:53:36,680|7533|140464355276544|paramiko.transport|DEBUG|userauth is OK
2016-07-27 23:53:36,684|7533|140464355276544|paramiko.transport|DEBUG|Authentication type (password) not permitted.
cawallin commented 8 years ago

Weird, that didn't go as far.

In that case, presto-admin seems like it tried to use the password from -I to log into the other node, rather than the private key. That's a bug, maybe it's non-deterministic whether it uses the password from -I or the private key to log into the node. If you'd be willing to share the full log via a gist (or email), we can double check that during the remainder of our business hours.

ebd2 commented 8 years ago

@zz22394: Can we get the presto-admin logs as a gist please (or by direct email, if you'd prefer)?

They should be in /var/log/prestoadmin

zz22394 commented 8 years ago

@ebd2 @cawallin Sorry to be late. I will send mail to you. Please wait😄

zz22394 commented 8 years ago

@ebd2 @cawallin I sent mail to cawallin, would you please check it? Thanks

ebd2 commented 8 years ago

@zz22394 I've managed to set up presto-admin in a whole bunch of ways that conflict with various permissions, but I haven't gotten it to try password authentication when it should have been using keys like you saw in https://github.com/prestodb/presto-admin/issues/212#issuecomment-235614841.

The good news is, there's a bug that's independent of ssh that could be causing this issue: https://github.com/prestodb/presto-admin/issues/212#issuecomment-235600319

We get the file as the user who is ssh'ing to the remote host, and if that user doesn't have permission to read the file /etc/presto/node.properties, you'll see that error. I'll be out next week, but I'll into the issue if I have time. I've let @cawallin know what I've found as well so she can continue looking into the issue as she is able to.

cawallin commented 8 years ago

@zz22394 -- check out this PR: https://github.com/prestodb/presto-admin/pull/217. I've tested it on an environment similar to your and added automated tests. We still need to make sure it works on other environments (EMR, etc), so it'll be a bit longer till that PR can be merged.

zz22394 commented 8 years ago

@cawallin Thanks for your fix. I tried with your PR #217 files. With your new files, [configuration show] and [server status] command can get correct info now.

cawallin commented 8 years ago

Glad to hear it! server install should work all the way through now too.

cawallin commented 8 years ago

217 was merged, closing this issue.