proycon / LaMachine

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
https://proycon.github.io/LaMachine
GNU General Public License v3.0
68 stars 20 forks source link

CLAM refuses to answer to Nginx #25

Closed spawn-guy closed 7 years ago

spawn-guy commented 7 years ago

Hello, we are trying to kickstart the LaMachine in form of a Docker deployment in AWS Cloud. Our purpose is to get Frog working with RESTful wrapper. we've modified the Dockerfile with nginx setting to run in foreground, so it would not close the running image. (added one line with RUN echo "daemon off;" >> /usr/src/LaMachine/nginx.conf and replaced CMD from bash to CMD /usr/src/LaMachine/startwebservices.sh)

i've built the image on windows pc and it builds and starts and runs... when i run it locally /frog is responding with a decent output of a clam service (i suppose,.. we are investigating the tool).

however, when i run it in the Cloud - i see a "default" page, but upstreams are failing with nginx error of connecting to upstream. or, actually, responding with a blank page.

2017/03/08 15:12:57 [error] 13#13: *9 connect() failed (111: Connection refused) while connecting to upstream, client: 172.17.0.1, server: localhost, request: "GET /frog HTTP/1.1", upstream: "uwsgi://127.0.0.1:3032", host: "blabla.eu-west-1.elasticbeanstalk.com", referrer: "http://blabla.eu-west-1.elasticbeanstalk.com/"

and later

[pid: 27|app: 0|req: 10/10] 172.17.0.1 () {52 vars in 909 bytes} [Wed Mar 8 16:09:36 2017] GET /frog => generated 5469 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 80 bytes (1 switches on core 1) [pid: 27|app: 0|req: 11/11] 172.17.0.1 () {52 vars in 909 bytes} [Wed Mar 8 16:09:36 2017] GET /frog => generated 5469 bytes in 1 msecs (HTTP/1.1 200)

can someone point us to a right direction of what could be the reason of this failure and how to fix it?

64bit Amazon Linux 2016.09 v2.5.0 running Docker 1.12.6

spawn-guy commented 7 years ago

i've tried to rebuild it on Amazon Linux today. still out of luck

[pid: 32|app: 0|req: 2/2] 92.255.255.255 () {46 vars in 819 bytes} [Thu Mar  9 10:14:48 2017] GET /frog => generated 5469 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 80 bytes (1 switches on core 1)
[pid: 32|app: 0|req: 3/3] 92.255.255.255 () {46 vars in 819 bytes} [Thu Mar  9 10:14:53 2017] GET /frog => generated 5469 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 80 bytes (1 switches on core 0)
[pid: 32|app: 0|req: 4/4] 92.255.255.255 () {46 vars in 819 bytes} [Thu Mar  9 10:14:54 2017] GET /frog => generated 5469 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 80 bytes (1 switches on core 1)
[pid: 32|app: 0|req: 5/5] 92.255.255.255 () {46 vars in 819 bytes} [Thu Mar  9 10:14:55 2017] GET /frog => generated 5469 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 80 bytes (1 switches on core 0)
[pid: 32|app: 0|req: 6/6] 92.255.255.255 () {46 vars in 819 bytes} [Thu Mar  9 10:14:56 2017] GET /frog => generated 5469 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 80 bytes (1 switches on core 1)
[pid: 30|app: 0|req: 1/1] 92.255.255.255 () {44 vars in 788 bytes} [Thu Mar  9 10:15:01 2017] GET /ucto => generated 6519 bytes in 45 msecs (HTTP/1.1 200) 2 headers in 80 bytes (1 switches on core 0)
spawn-guy commented 7 years ago

okay, status update: the CLAM IS working! however, it answers ONLY in XML.

<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://127.0.0.1:8080/frog//static/interface.xsl"?>
<clam xmlns:xlink="http://www.w3.org/1999/xlink" version="2.1.8" id="frog" name="Frog" user="anonymous" baseurl="http://127.0.0.1:8080/frog/" interfaceoptions="" oauth_access_token="">
    <description>Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch, developed at Tilburg University. It is the successor of Tadpole.</description>
    <version>0</version>
    <email>None</email>
    <projects totalsize="0">
    </projects>
    <profiles>
        <profile>
            <input>
                <InputTemplate id="maininput" format="PlainTextFormat" label="Text document" mimetype="text/plain" extension="txt" optional="no" unique="no" acceptarchive="no">
                    <StaticParameter id="encoding" name="Encoding" description="The character encoding of the file" value="utf-8" />

where do we need to look to fix this problem? or.. well.. shall i research into CLAM and restful usage?

spawn-guy commented 7 years ago

ookay.. "rest" service is working, but processing with Frog fails - i need to get some logs

proycon commented 7 years ago

Sorry for the delay in getting back to you. I see you already managed to get far. The XML response is correct indeed. This XML will translate to HTML by the associated XSL styleshet client-side when viewed in a browser, but only provided that the hostname is an exact match (http://127.0.0.1:8080/frog/ in your case, you may want to check the XSL stylesheet can be properly accessed?

As to error logs, there should be an error.log file for each clam project, accessible through the webservice.

spawn-guy commented 7 years ago

@proycon this whole docker thing is new to me, but i am glad that it went so far indeed. also docker practice.

so, the 127.0.0.1:8080 is not working due to it is being run on a remote instance. not on a local machine. maybe,.. this whole 127.0.0.1 linking is related to https://github.com/proycon/LaMachine/blob/master/startwebservices.sh#L13 ?

accessible through the webservice.

this is something new and completely non-intuitive :) LaMa+CLAM+Services+Docker+Frog and blank pages and running it on aws elasticBeanstalk ... at the same time.

proycon commented 7 years ago

I must admit that the docker-variant is less tested/used when it comes to the webservices. You are running the container with the portmap -p 8080:80 right? Mapping port 8080 of the host system to 80 of the container (nginx). So everything works except you don't get the interface presented when connecting through a browser, how are you accessing it? Only http://127.0.0.1:8080/frog should work.

spawn-guy commented 7 years ago

I've tried all the ways :)

And succeed with ubuntu with docker and running it with nginx on port 80 (expose 80) and upstreams to uwsgi on ports 3032+ Then docker -p 8080:80 and accessing via externalhostname:8080

However, my goal is to run it on 80:80 behind a load balancer that will optionally terminate ssl(https)

I am also building the image from a modified Dockerfile that makes nginx to run in foreground. So I'll have to rebuild it tomorrow and see if your fix did something nice to it.

And now im thinking on a different approach with 'just' frog + frog.clam and no nginx at all. But at this point this is nice to have.

I am also not quite fond of the whole idea behind clam :/ with projects, files, polling and stuff... Frog-daemon with xml output to a POST would do just fine to serve my needs

proycon commented 7 years ago

Then docker -p 8080:80 and accessing via externalhostname:8080

Okay, so the problem is that CLAM doesn't know it's being access through externalhostname:8080. You were indeed on the right track pointing to https://github.com/proycon/LaMachine/blob/master/startwebservices.sh#L13 earlier, this is where a URL is forced, and only that one can be used to fully access the interface (it's a security limitation in browsers not to execute cross-domain XSL). So if you set that to your desired URL it should work. I'll adapt LaMachine as currently only the port if configurable.

And now im thinking on a different approach with 'just' frog + frog.clam and no nginx at all. But at this point this is nice to have.

I am also not quite fond of the whole idea behind clam :/ with projects, files, polling and stuff... Frog-daemon with xml output to a POST would do just fine to serve my needs

Frog has a server mode (-S <port> option), it offers a simple TCP server and may be enough for your purposes. See the Frog documentation for further details, a Python client for it is also available in https://github.com/proycon/pynlpl .

proycon commented 7 years ago

You can now say something like sudo startclamwebservices.sh http://externalhostname:8080 to force a hostname for the CLAM webservices. This should hopefully solve this issue that the interface doesn't show. Please reopen if problems persist.