Scalaris server won't start, perhaps because it cannot find fully qualified domain name?

GoogleCodeExporter commented 8 years ago

After getting the latest scalaris version from the repository, and after doing 
'configure ; make' on a recent Ubuntu distribution, I cannot start the server. 
Doing 

---------------------
reeuwijk@babylon:~/lab/scalaris-front$ bin/scalarisctl boot start
---------------------

simply returns immediately without any output.

While trying to solve this, I get:

---------------------
reeuwijk@babylon:~/lab/scalaris-front$ bin/scalarisctl checkinstallation
you have to set a fully qualified domain name (FQDN)
---------------------

As far as I know, the scalaris FAQ does not address this issue, except that is 
says:

---------------------
For getting the node name Erlang uses, call:

erl -noinput -name boot -eval "io:format(\"~p~n\", [node()]), halt()." 
---------------------

However, when I do that I get:

---------------------
reeuwijk@babylon:~/lab/scalaris-front$ erl -noinput -name boot -eval 
"io:format(\"~p~n\", [node()]), halt()." 
{error_logger,{{2010,8,18},{16,25,37}},"Protocol: ~p: register error: 
~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}},[{inet_tcp_dist,listen,1},{
net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},
{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}]}
{error_logger,{{2010,8,18},{16,25,37}},crash_report,[[{initial_call,{net_kernel,
init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{er
ror,badarg},[{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},{ancestors,[
net_sup,kernel_sup,<0.9.0>]},{messages,[]},{links,[#Port<0.64>,<0.17.0>]},{dicti
onary,[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,377},{sta
ck_size,24},{reductions,453}],[]]}
{error_logger,{{2010,8,18},{16,25,37}},supervisor_report,[{supervisor,{local,net
_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{p
id,undefined},{name,net_kernel},{mfa,{net_kernel,start_link,[[boot,longnames]]}}
,{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
{error_logger,{{2010,8,18},{16,25,37}},supervisor_report,[{supervisor,{local,ker
nel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined
},{name,net_sup},{mfa,{erl_distribution,start_link,[]}},{restart_type,permanent}
,{shutdown,infinity},{child_type,supervisor}]}]}
{error_logger,{{2010,8,18},{16,25,37}},std_info,[{application,kernel},{exited,{s
hutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}
{"Kernel pid 
terminated",application_controller,"{application_start_failure,kernel,{shutdown,
{kernel,start,[normal,[]]}}}"}

Crash dump was written to: erl_crash.dump
Kernel pid terminated (application_controller) 
({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
---------------------

The erlang version on my machine is as follows:
---------------------
reeuwijk@babylon:~/lab/scalaris-front$ erl -v
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] 
[hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 
---------------------

Original issue reported on code.google.com by Kees.van...@gmail.com on 18 Aug 2010 at 2:34

GoogleCodeExporter commented 8 years ago

looks as if there still is a running erlang node called 'boot' which is why the 
command to get a node's name from erlang fails - try another node name, e.g.
erl -noinput -name bootxy -eval "io:format(\"~p~n\", [node()]), halt()." 

(also you should possibly stop the boot node, e.g. ./bin/scalarisctl boot stop)

scalarisctl is supposed to return with no output - to get an interactive shell, 
call it like this:
./bin/scalarisctl -i boot start

Nonetheless, we currently only support nodes with FQDNs as there are some 
implications in case a node is known by different names, e.g. locally the node 
is boot@localhost and remotely it is boot@xx.zz. Connections from other nodes 
via distributed erlang only work if the two nodes share the same information 
about themselves.

Original comment by nico.kru...@googlemail.com on 18 Aug 2010 at 2:48

GoogleCodeExporter commented 8 years ago

It would be very, very helpful if 'scalarisctl boot start' would give some 
feedback, even when there are no problems. Simply printing something like 
"Server 'boot' started successfully" would help enormously. And when there is a 
problem, including a server that is already running, it would help enormously 
if the software would say so.

Note that I do not want an interactive shell, I simply want some indication 
that my command has succeeded or failed. If it is important in some cases to 
have an absolutely silent startup, I suggest you add support for a '-q' flag 
that switches off the messages.

Regarding the FQDN issue, to my amazement there actually was a problem on my 
Ubuntu machine. I've managed to fix it by explicitly setting a domainname in 
/etc/hosts, but would be very helpful if the Scalaris FAQ would explain the 
issues much more clearly than it does now, and perhaps even suggest some fixes.

It would also help when the scalarisctl checkinstallation command would be a 
bit more talkative. I still have no idea what the magic incantation 'erl 
-noinput -name boot -eval "io:format(\"~p~n\", [node()]), halt()."' is supposed 
to do, but perhaps the checkinstallation command could do it by default, and 
give a clearer diagnosis text instead of the (for me) totally unreadable output 
I quoted above.

Original comment by Kees.van...@gmail.com on 19 Aug 2010 at 11:10

GoogleCodeExporter commented 8 years ago

Unfortunately erlang doesn't provide us with a proper exit code when we execute 
it in "detached" mode, i.e. non-interactive. Also even from the existence of a 
running scalaris node (boot, or ordinary node) you cannot see whether the 
processes are actually working - there could be an exception killing them after 
which they would be re-started by one of the supervisors.
The only safe way to check whether a node is up and running is to check the log 
files (which I have recently improved in rev1013) or run an interactive shell 
(you will see the same output as in the log file) or check in the boot node's 
web interface but even there diagnosing such rogue nodes is not easy.

Original comment by nico.kru...@googlemail.com on 19 Aug 2010 at 5:10

GoogleCodeExporter commented 8 years ago

You are right the following output is not helping at all:
However, when I do that I get:

---------------------
reeuwijk@babylon:~/lab/scalaris-front$ erl -noinput -name boot -eval 
"io:format(\"~p~n\", [node()]), halt()." 
{error_logger,{{2010,8,18},{16,25,37}},"Protocol: ~p: register error: 
~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}},[{inet_tcp_dist,listen,1},{
net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},
{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}]}
{error_logger,{{2010,8,18},{16,25,37}},crash_report,[[{initial_call,{net_kernel,
init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{er
ror,badarg},[{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},{ancestors,[
net_sup,kernel_sup,<0.9.0>]},{messages,[]},{links,[#Port<0.64>,<0.17.0>]},{dicti
onary,[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,377},{sta
ck_size,24},{reductions,453}],[]]}
{error_logger,{{2010,8,18},{16,25,37}},supervisor_report,[{supervisor,{local,net
_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{p
id,undefined},{name,net_kernel},{mfa,{net_kernel,start_link,[[boot,longnames]]}}
,{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
{error_logger,{{2010,8,18},{16,25,37}},supervisor_report,[{supervisor,{local,ker
nel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined
},{name,net_sup},{mfa,{erl_distribution,start_link,[]}},{restart_type,permanent}
,{shutdown,infinity},{child_type,supervisor}]}]}
{error_logger,{{2010,8,18},{16,25,37}},std_info,[{application,kernel},{exited,{s
hutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}
{"Kernel pid 
terminated",application_controller,"{application_start_failure,kernel,{shutdown,
{kernel,start,[normal,[]]}}}"}

Crash dump was written to: erl_crash.dump
Kernel pid terminated (application_controller) 
({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
---------------------

However, this output is created even before any code of ours is run. The output 
comes from the Erlang runtime environment. I will add a test to 
checkinstallation to see whether you are already running a Scalaris node.

Original comment by schu...@gmail.com on 20 Aug 2010 at 7:31

GoogleCodeExporter commented 8 years ago

With the client wrapper script java-api/scalaris using an erlang-provided node 
name, this should be fixed for good - at least as long as there is an erl 
executable on the same node. In all other setups, the Java-API should/can not 
connect to localhost since a scalaris node can not start without erlang.
Also I adapted the FAQs a bit.

see changes in r1078 - r1083

Original comment by nico.kru...@googlemail.com on 1 Sep 2010 at 3:30

Changed state: Fixed

seyyed / scalaris

Scalaris server won't start, perhaps because it cannot find fully qualified domain name? #61