ohsu-comp-bio / funnel

Funnel is a toolkit for distributed task execution via a simple, standard API.
https://ohsu-comp-bio.github.io/funnel
MIT License
121 stars 32 forks source link

Funnel responds with 'Closed explicitly' #611

Open kmavrommatis opened 5 years ago

kmavrommatis commented 5 years ago

Hi, I am getting the following message in the funnel logs, and neither command line requests nor the web interface work

{"err":"Closed explicitly","level":"debug","msg":"responding: /tes.TaskService/ListTasks","ns":"server","resp":null,"time":"2019-06-03T18:55:42Z"}
{"error":"Closed explicitly","level":"error","msg":"Calling ListTasks","ns":"aws-batch","time":"2019-06-03T18:56:11Z"}

When I try on the command line

$curl': curl http://10.112.17.89:8001/v1/tasks/bjpfhrq0fs0g00dn4is

{
  "error": "Closed explicitly",
  "code": 2
}

The same queries used to work a day ago. If I restart the server it works, but eventually ends up in this state. What does this mean? and what causes it? Is there any configuration option that needs to be considered?

The funnel server is interacting with mongodb (located on a different server), and submits jobs to AWS batch. Thanks

adamstruck commented 5 years ago

Funnel wasn't handling mongo sessions properly. I'm guessing the session was becoming stale and was being closed by the mongo server.

Try the fix referenced in the PR above.

kmavrommatis commented 5 years ago

The new version does not seem to work properly. The jobs are submitted and immediately go to state 'SYSTEM_ERROR' without being sent to AWS batch. In the log files (with the debug enabled) there is no information.

Using the previous version (v0.9.0 from DockerHub) which eventually gets to the 'Closed explicitly' state, the jobs are submitted as expected and go to states QUEUED and RUNNING etc

adamstruck commented 5 years ago

The changes I made shouldn't have affected jobs being submitted to batch. Nonetheless I found a spot where I wasn't logging a potential error in the CreateTask method. I pushed up a new commit to that branch that should provide this missing error logging.

kmavrommatis commented 5 years ago

Thanks, there is still the same problem This version is still not able to submit jobs to AWS Batch. (the previous version v0.9.0 can on teh same node, with the same config files) The only relevant error I can see is

{"error":"RequestError: send request failed\ncaused by: Post https://batch.us-east-1.amazonaws.com/v1/submitjob: x509: failed to load system roots and no roots provided","level":"error","msg":"error submitting task to compute backend","ns":"server","taskID":"bk3ut6qi76gg008vei1g","time":"2019-06-17T19:48:44Z"}

Thanks K

adamstruck commented 5 years ago

Are you running your funnel server in a docker container? If so, I think the problem may have been with the base image I was using. Try rebuilding the container (make docker) using the branch in #613.

kmavrommatis commented 5 years ago

I am using it in an image, but I compiled it myself.

FROM ubuntu:18.04

RUN apt-get update &&\
    apt-get -y install git golang go-dep

ENV GOROOT=/usr/lib/go-1.10/
ENV GOPATH=/opt/
ENV PATH=$PATH:$GOROOT/bin

WORKDIR /opt

WORKDIR /opt/src/github.com/ohsu-comp-bio/
RUN git clone https://github.com/ohsu-comp-bio/funnel.git
WORKDIR /opt/src/github.com/ohsu-comp-bio/funnel
#RUN git checkout mongo-session && git pull origin mongo-session
RUN make && go build

FROM ubuntu:18.04

RUN mkdir -p /opt/funnel/funnel-work-dir
COPY --from=0 /opt/src/github.com/ohsu-comp-bio/funnel/funnel /opt/funnel/funnel

ENTRYPOINT ["/opt/funnel/funnel"]

CMD []

Nevertheless, I will try your suggestion and let you know

adamstruck commented 5 years ago

Ah ok, I think the source of the error is that the ubuntu:18.04 image doesn't com with ca-certificates installed. See http://vinceyuan.github.io/solving-ssl-root-certificates/ and https://stackoverflow.com/questions/45388934/how-do-i-make-an-https-call-in-a-busybox-docker-container-running-go

kmavrommatis commented 5 years ago

Perfect, thanks for the pointer. This solved the problem...

kmavrommatis commented 5 years ago

For completion the dockerfile is now:


FROM ubuntu:18.04

RUN apt-get update &&\
    apt-get -y install git golang go-dep

ENV GOROOT=/usr/lib/go-1.10/
ENV GOPATH=/opt/
ENV PATH=$PATH:$GOROOT/bin

WORKDIR /opt

WORKDIR /opt/src/github.com/ohsu-comp-bio/
RUN git clone https://github.com/ohsu-comp-bio/funnel.git
WORKDIR /opt/src/github.com/ohsu-comp-bio/funnel
RUN make && go build

FROM ubuntu:18.04

RUN mkdir -p /opt/funnel/funnel-work-dir
COPY --from=0 /opt/src/github.com/ohsu-comp-bio/funnel/funnel /opt/funnel/funnel

RUN apt-get update
RUN apt-get install -y ca-certificates

ENTRYPOINT ["/opt/funnel/funnel"]

CMD []