stanfordnlp / chirpycardinal

Stanford's Alexa Prize socialbot
GNU Affero General Public License v3.0
131 stars 28 forks source link

Running Tests #18

Open anupamme opened 3 years ago

anupamme commented 3 years ago

Hello

I am trying to run tests and running into this issue:

integration_base.py: Line 17

from bin.run_utils import setup_lambda, ASKInvocation, setup_logtofile

I cannot find the run_utils.py file in the repository.

Am I missing something here?

AshwinParanjape commented 3 years ago

This is a bug on our part. Thanks for bringing it up. The integration tests seem to be using legacy code and we'll update them to use the refactored version soon. Keeping this issue open.

anupamme commented 3 years ago

Okay, any idea of by when it will be fixed?

Or I can try fixing it and sending a pull request if you can give some idea of how to update the code.

As it turns out that when I run the shell_chat I get an error because corenlp is timing out (although the docker image is running) and there is no way for me to test different components other than running the tests. Or is there is one?

AshwinParanjape commented 3 years ago

If it is a timeout issue, you can disable timeouts by setting the following flag to False. https://github.com/stanfordnlp/chirpycardinal/blob/3e0656f954ac42b8ddcac4c03d76ed1cd21d1898/chirpy/core/flags.py#L3 Let us know if this solves the issue for you. If the shell_chat is failing, pretty much all the integration tests will also fail and not really lead you to the root cause. But if you are able to run shell_chat after disabling timeouts, there is hope that failed integration tests will tell you something meaningful.

You can also query the remote module directly (localhost:\<port>) with postman or curl to check if it is working as expected. You would need to pass json data like so - https://github.com/stanfordnlp/chirpycardinal/blob/3e0656f954ac42b8ddcac4c03d76ed1cd21d1898/chirpy/annotators/corenlp.py#L278

anupamme commented 3 years ago
  1. Disabling Timeout does not work, I get: RemoteCallable returned a HTTPError when running corenlp: 502 Server Error: Bad Gateway for url: http://localhost:3300/

  2. telnet localhost 3300 (works)

  3. curl --header "Content-Type: application/json" --request POST --data '{'text': 'hello', 'annotators': 'pos,ner,parse,sentiment'} ' http://localhost:3300/ returns 502 Bad gateway

  4. curl http://localhost:3300/ also returns 502 Bad gateway.

I tried restarting the docker image, still the same output.

What can I do next?

AshwinParanjape commented 3 years ago

If the container is running and connected, then it must be throwing an error internally. You can look at the logs in a docker container using docker logs as documented here - https://docs.docker.com/engine/reference/commandline/logs/

anupamme commented 3 years ago

When I do:

curl --header "Content-Type: application/json" --request POST --data '{'text': 'hello', 'annotators': 'pos,ner,parse,sentiment'} ' http://localhost:3300/

I get 502 Bad Gateway.

When I do

docker logs corenlp_container,

I see the following two exceptions:

First: [main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/englis h.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt [2021-05-02 09:34:52 +0000] [17] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker worker.init_process() File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 126, in init_process self.load_wsgi() File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 65, in load return self.load_wsgiapp() File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 352, in import_app import(module) File "/deploy/app/app.py", line 11, in import remote_module File "/deploy/app/remote_module.py", line 20, in client.ensure_alive() # this takes some time especially if you specify many annotators above File "/usr/local/lib/python3.7/site-packages/stanfordnlp/server/client.py", line 137, in ensure_alive raise PermanentlyFailedException("Timed out waiting for service to come alive.") stanfordnlp.server.client.PermanentlyFailedException: Timed out waiting for service to come alive.

Second [2021-05-02 09:49:20 +0000] [282] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker worker.init_process() File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 126, in init_process self.load_wsgi() File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 65, in load return self.load_wsgiapp() File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 352, in import_app import(module) File "/deploy/app/app.py", line 11, in import remote_module File "/deploy/app/remote_module.py", line 20, in client.ensure_alive() # this takes some time especially if you specify many annotators above File "/usr/local/lib/python3.7/site-packages/stanfordnlp/server/client.py", line 137, in ensure_alive raise PermanentlyFailedException("Timed out waiting for service to come alive.") stanfordnlp.server.client.PermanentlyFailedException: Timed out waiting for service to come alive.

From this it seems that stanfordnlp image/service is not running so when I do (to test this image):

Request: curl --header "Content-Type: application/json" --request POST --data '{'text': 'hello'} ' http://localhost:3400/ Response:{"message": "The browser (or proxy) sent a request that this server could not understand."}

When I do

docker logs stanfordnlp_container

I see this in logs (no exception):

/usr/lib/python2.7/dist-packages/supervisor/options.py:461: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security. 'Supervisord is running as root and it is searching ' 2021-05-02 09:47:33,960 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. 2021-05-02 09:47:33,961 INFO Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing 2021-05-02 09:47:33,991 INFO RPC interface 'supervisor' initialized 2021-05-02 09:47:33,995 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2021-05-02 09:47:33,995 INFO supervisord started with pid 1 2021-05-02 09:47:34,999 INFO spawned: 'nginx' with pid 11 2021-05-02 09:47:35,005 INFO spawned: 'gunicorn' with pid 12 [2021-05-02 09:47:35 +0000] [12] [INFO] Starting gunicorn 19.7.1 [2021-05-02 09:47:35 +0000] [12] [INFO] Listening at: http://127.0.0.1:5001 (12) [2021-05-02 09:47:35 +0000] [12] [INFO] Using worker: sync [2021-05-02 09:47:35 +0000] [19] [INFO] Booting worker with pid: 19 [2021-05-02 09:47:35 +0000] [20] [INFO] Booting worker with pid: 20 [2021-05-02 09:47:35 +0000] [21] [INFO] Booting worker with pid: 21 [2021-05-02 09:47:35 +0000] [22] [INFO] Booting worker with pid: 22 2021-05-02 09:47:36,689 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2021-05-02 09:47:36,689 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

So not clear to me what is going wrong with the stanfordnlp service, can you please advise?

AshwinParanjape commented 3 years ago

So the port 3300 is associated with corenlp annotator, which is different from stanfordnlp annotator on port 3400. They aren't calling each other. The stanfordnlp annotator uses the pure python (now called stanza) version and looking at the logs, it seems that it is working.

Underlying the corenlp annotator is the Java based corenlp service. It could be confusing because we use the stanfordnlp (Python) wrapper to start and access the Java CoreNLP server. Looking at the stanfordnlp code (https://github.com/stanfordnlp/stanfordnlp/blob/f584a636a169097c8ac4d69fbeaee2c553b28c9c/stanfordnlp/server/client.py#L91) there is a 120 second timeout. I think what is happening here is that it is taking too long time for the Java based corenlp server to start running. This can happen due to many reasons, the machine could be underpowered to run all the docker containers simultaneously or maybe the docker container doesn't have enough CPU or RAM dedicated to it (you can modify allocated resources in Docker Desktop preferences). As a last resort (I would not recommend this, but if you can't increase compute you might have to do this), you can also clone the stanfordnlp repo, modify the timeout to be much longer, say 1200 and change the dockerfile to pip install -e the modified repo instead of getting it from pypi.

anupamme commented 3 years ago

I increased the resources and now when I do

docker logs corenlp_container | tail -f output: [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... [main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 16.91 (s) [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [18.9 sec]. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref [main] INFO edu.stanford.nlp.coref.statistical.SimpleLinearClassifier - Loading coref model edu/stanford/nlp/models/coref/statistical/ranking_model.ser.gz ... done [0.8 sec]. [main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: dependency [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator sentiment [main] INFO CoreNLP - Starting server... [main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:9000

However when I run request: curl --header "Content-Type: application/json" --request POST --data '{'text': 'hello', 'annotators': 'pos,ner,parse,sentiment'}' http://localhost:3300/ response: {"message": "The browser (or proxy) sent a request that this server could not understand."}

Am I not doing the curl request correctly?

AshwinParanjape commented 3 years ago

I think you should be using double quotes for the strings in the json payload (https://stackoverflow.com/questions/7172784/how-do-i-post-json-data-with-curl). So maybe try curl --header "Content-Type: application/json" --request POST --data '{"text": "hello", "annotators": "pos,ner,parse,sentiment"}' http://localhost:3300/

anupamme commented 3 years ago

Thanks for suggesting the fix. It worked like a charm.

Now I want to come back to the original topic of the issue of fixing tests.integration_tests.integration_base, where line 17

from bin.run_utils import setup_lambda, ASKInvocation, setup_logtofile

is failing.

Can you suggest a quick fix/hack for me to be able to run the tests?

This is important/urgent for me because when I write new ResponseGenerators and I need to debug them running tests seems like the easiest way for that (or is there a better way?).

AshwinParanjape commented 3 years ago

Unfortunately there isn't a quick fix/hack for this. If you were to try and fix it, you would have to use agents.local_agent directly and replace the calls to ASKInvocation from other places in integration_base. I don't think I'll be able to fix it in the next few days.

Meanwhile, it is actually not necessary to be able to run the integration tests to debug the response generators. At first, you could just run the shell_chat and see if there are any particular errors. You could remove all the unnecessary response generators: https://github.com/stanfordnlp/chirpycardinal/blob/635957802720467cd4447caa9481ad6152f10bd3/agents/local_agent.py#L200

Eventually if there's a repeating sequence of user responses that you need to feed while you are debugging, you could provide them programmatically: https://github.com/stanfordnlp/chirpycardinal/blob/635957802720467cd4447caa9481ad6152f10bd3/servers/local/shell_chat.py#L75