openedx-unsupported / configuration

A collection of edx configuration scripts and utilities that edx.org uses to deploy openedx.
GNU Affero General Public License v3.0
823 stars 971 forks source link

Problem With Sinatra - Timeout when waiting for 127.0.0.1:4567 #204

Closed elimence closed 10 years ago

elimence commented 11 years ago

After merging the latest pull reguest (#202), The following error showed up. When i try to access studio or lms, i get the page not found message from the browser.

TASK: [forum | test that the required service are listening] ****************** 
failed: [localhost] => (item={'host': 'localhost', 'port': '4567', 'service': 'sinatra'}) =>     {"elapsed": 10, "failed": true, "item": {"host": "localhost", "port": "4567", "service":     "sinatra"}}
msg: Timeout when waiting for 127.0.0.1:4567
ok: [localhost] => (item={'host': u'localhost', 'port': '27017', 'service': 'mongo'})
ok: [localhost] => (item={'host': u'localhost', 'port': '28017', 'service': 'mongo'})
ok: [localhost] => (item={'host': u'localhost', 'port': '9200', 'service': 'elasticsearch'})
ok: [localhost] => (item={'host': u'localhost', 'port': '9300', 'service': 'elasticsearch'})

FATAL: all hosts have already failed -- aborting
e0d commented 11 years ago

This particular error is from the tests that run at the end of the forums installation. Port 4567 is the Sinatra app that provides the forum functionality.

I believe you are on Ubuntu, so look in /var/log/upstart/cs_comments_service.log and see what messages are there.

elimence commented 11 years ago

Here's the content of cs_comments_service.log. I don't really know what it's trying to say. Thanks for helping.

[!]You shold install `padrino-helpers' gem if you want to use kaminari's pagination helpers with Sinatra.
[!]Kaminari::Helpers::SinatraHelper does nothing now...
[2013-08-02 19:27:25] INFO  WEBrick 1.3.1
[2013-08-02 19:27:25] INFO  ruby 1.9.3 (2013-06-27) [x86_64-linux]
== Sinatra/1.3.3 has taken the stage on 4567 for development with backup from WEBrick
[2013-08-02 19:27:25] INFO  WEBrick::HTTPServer#start: pid=5717 port=4567

== Sinatra has ended his set (crowd applauds)
[!]You shold install `padrino-helpers' gem if you want to use kaminari's pagination helpers with Sinatra.
[!]Kaminari::Helpers::SinatraHelper does nothing now...
[2013-08-02 20:01:19] INFO  WEBrick 1.3.1
[2013-08-02 20:01:19] INFO  ruby 1.9.3 (2013-06-27) [x86_64-linux]
== Sinatra/1.3.3 has taken the stage on 4567 for development with backup from WEBrick
[2013-08-02 20:01:19] INFO  WEBrick::HTTPServer#start: pid=23333 port=4567

== Sinatra has ended his set (crowd applauds)
[!]You shold install `padrino-helpers' gem if you want to use kaminari's pagination helpers with Sinatra.
[!]Kaminari::Helpers::SinatraHelper does nothing now...
[2013-08-02 20:47:06] INFO  WEBrick 1.3.1
[2013-08-02 20:47:06] INFO  ruby 1.9.3 (2013-06-27) [x86_64-linux]
== Sinatra/1.3.3 has taken the stage on 4567 for development with backup from WEBrick
[2013-08-02 20:47:06] INFO  WEBrick::HTTPServer#start: pid=8744 port=4567
[!]You shold install `padrino-helpers' gem if you want to use kaminari's pagination helpers with Sinatra.
[!]Kaminari::Helpers::SinatraHelper does nothing now...
[2013-08-03 21:42:50] INFO  WEBrick 1.3.1
[2013-08-03 21:42:50] INFO  ruby 1.9.3 (2013-06-27) [x86_64-linux]
== Sinatra/1.3.3 has taken the stage on 4567 for development with backup from WEBrick
[2013-08-03 21:42:50] INFO  WEBrick::HTTPServer#start: pid=12831 port=4567
yuvadm commented 11 years ago

+1 on this issue, this happens at the end of a plain vanilla installation for me as well.

dhasthagheer commented 11 years ago

I am also getting the same error. Can anyone tell me how to solve this?

kevinchugh commented 11 years ago

The timeout may refer to elastic search or mongo not being running.

On Thursday, August 22, 2013, dhasthagheer wrote:

I am also getting the same error. Can anyone tell me how to solve this?

— Reply to this email directly or view it on GitHubhttps://github.com/edx/configuration/issues/204#issuecomment-23083373 .

e0d commented 11 years ago

So the timeout in the original message originates from tests that are run to ensure the installation has succeeded. This occurs in roles/forum/tasks/test.yml and relies on config from roles/forum/vars/main.yml.

It look like everything is succeeding except the sinatra test.

Can you try to start the service directly and see whether there is any additional details?

service cs_comments_service stop service cs_comments_service start

Also, please let me know the OS. I believe that @elimence is on centos, but I don't know in the other cases.

dhasthagheer commented 11 years ago

I am working on Ubuntu 12.04 server. I am getting following messages while running

service cs_comments_service stop

stop: Rejected send message, 1 matched rules; type="method_call", sender=":1.44" (uid=1000 pid=9717 comm="stop cs_comments_service ") interface="com.ubuntu.Upstart0_6.Job" member="Stop" error name="(unset)" requested_reply="0" destination="com.ubuntu.Upstart" (uid=0 pid=1 comm="/sbin/init")

service cs_comments_service start

start: Rejected send message, 1 matched rules; type="method_call", sender=":1.45" (uid=1000 pid=9722 comm="start cs_comments_service ") interface="com.ubuntu.Upstart0_6.Job" member="Start" error name="(unset)" requested_reply="0" destination="com.ubuntu.Upstart" (uid=0 pid=1 comm="/sbin/init")

e0d commented 11 years ago

Looks like you are not running the command with sudo or as root, please do that, e.g.,

sudo service cs_comments_service start

dhasthagheer commented 11 years ago

sudo service cs_comments_service start

service is running now

Should I continue,

sudo ansible-playbook -c local --limit "localhost:127.0.0.1" ./edx_sandbox.yml -i "localhost,"

Or what I do next?

e0d commented 11 years ago

I'm not sure why the service health test is timing out, sometimes during our post install test. I don't want to bump up the timeout beyond, the already generous 10 seconds, but obviously more than one person has seen this.

elimence commented 11 years ago

@e0d I use Ubuntu 12.04 server as well.

I run the ansible command to connect studio, lms and lms preview to sub-domains i had set up previously. After this command, i realized that the Sinatra error disappeared. Didn't know about restarting the the comments service at the time.

On Thu, Aug 22, 2013 at 5:12 PM, e0d notifications@github.com wrote:

I'm not sure why the service health test is timing out, sometimes during our post install test. I don't want to bump up the timeout beyond, the already generous 10 seconds, but obviously more than one person has seen this.

— Reply to this email directly or view it on GitHubhttps://github.com/edx/configuration/issues/204#issuecomment-23108745 .

dhasthagheer commented 11 years ago

If I run "nmap localhost" it showing port 4567 is open but the service name is unknown not sinatra.

4567/tcp open unknown

Is that cause problem?. If we stop testing the service (roles/forum/tasks/test.yml), will it complete the installation?. I will try and let you know.

hikari-boulders commented 11 years ago

Fyi, I only so far encountered the error when trying to replay a playbook. But never encountered it on a vanilla machine. Can maybe someone try to confirm it with a replay?

Also I'm not sure if an unknown service on port 4567 is really a problem. What does port 4567 say on a "correct" machine? port 4567

elimence commented 11 years ago

@hikari-boulders That's usually an unused port, at least for most healthy machines i have tested on

@dhasthagheer I don't believe that is the cause, because i was able to get the platform running, however i get the same unknown service running on the Sinatra port. When i run

sudo netstat -tulpn | grep :4567

i get the following result

tcp        0      0 0.0.0.0:4567            0.0.0.0:*               LISTEN      5962/ruby

If you are still having the Sinatra timeout error, run the following command

cd /var/tmp/configuration/playbooks

ansible-playbook -c local --limit "localhost:127.0.0.1" /var/tmp/configuration/playbooks/edx_sandbox.yml \
-i "localhost," -e 'cms_nginx_port=80  lms_preview_nginx_port=80  c_lms_base=example.com \
c_preview_lms_base=preview.example.com'

NB: i used actual domain/subdomain names when i did it. After this, the timeout error went away. I have no idea why at the moment.

hikari-boulders commented 11 years ago

Is this still an issue?

Can someone try the following:

Reproductions?

jrolland commented 11 years ago

I have the same problem, I've try the following: sudo stop edxapp then replay the playbook. And the problem is still there...

ryates commented 11 years ago

I am having the same problem after trying to replay the playbook on a new, vanilla Ubuntu 12.04 after hitting a edx-platform issue on the release branch that is resolved on master. I ran:

sudo stop edxapp

as well as killed the ruby process listening on 4567. Replaying the playbook still fails with the same error. Error details, result of netstat and result of lsof below:

TASK: [forum | test that the required service are listening] ****************** 
failed: [localhost] => (item={'host': 'localhost', 'port': '4567', 'service': 'sinatra'}) => {"elapsed": 10, "failed": true, "item": {"host": "localhost", "port": "4567", "service": "sinatra"}}
msg: Timeout when waiting for 127.0.0.1:4567
$ netstat -na | grep 4567
tcp        0      0 0.0.0.0:4567            0.0.0.0:*               LISTEN
lsof -i:4567 -n
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ruby    19631 root    7u  IPv4 297830      0t0  TCP *:4567 (LISTEN)
$ ps -ef | grep 19631
root     19631 19630  1 04:08 ?        00:00:06 ruby app.rb

$ ps -ef | grep 19630
root     19630     1  0 04:08 ?        00:00:00 /bin/sh -e /proc/self/fd/9
ryates commented 11 years ago

Quick update - I just tried again on a brand new, Precise server and ran into the same problem. That said, I'm updating roles/edxapp/vars/main.yml to set edx_platform_commit to master which may not be a proper thing to do.

pchiu33 commented 11 years ago

Met this today with the release branch, on the same clean precise64 box. However issue didn't reproduce after re-running the playbook right after the failure.

Even if that very unlikely, maybe related to a temporary load-peak that took more than 10 seconds (being the current wait delay for the check)..

Will try it again on another machine with more resources.

VladimirPal commented 11 years ago

I have the same issue.

VladimirPal commented 11 years ago

The problem was in a slow server. I have only 2gb ram. So I increased to 4 and it worked. Or you can comment in /var/tmp/configuration/playbooks/roles/forum/tasks/main.yml

include: test.yml

jarv commented 10 years ago

closing this, let us know if you have any similar issues with the new release.