Closed livenson closed 3 years ago
From these log messages it seems as if your TIKA environment variables are not correctly set. As far as I know you need to set TIKA_SERVER_JAR which should point to your local tika jar file.
On Tue, Jan 19, 2021 at 4:11 PM Ilja Livenson notifications@github.com wrote:
Hi,
after the last upgrade to the latest master, I started seeing errors when running in pure python docker container:
2021-01-19 13:26:42,341 - werkzeug - INFO - 192.168.144.4 - - [19/Jan/2021 13:26:42] "POST /fuji/api/v1/evaluate HTTP/1.1" 500 - 2021-01-19 13:26:56,732 - tika.tika - ERROR - Unable to run java; is it installed? /usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'dataverse.no'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings warnings.warn( 2021-01-19 13:26:56,732 [Thread-85 ] [ERROR] Unable to run java; is it installed? 2021-01-19 13:26:56,734 - tika.tika - ERROR - Failed to receive startup confirmation from startServer. 2021-01-19 13:26:56,734 [Thread-85 ] [ERROR] Failed to receive startup confirmation from startServer. 2021-01-19 13:26:56,755 - werkzeug - INFO - 192.168.144.4 - - [19/Jan/2021 13:26:56] "POST /fuji/api/v1/evaluate HTTP/1.1" 500 -
Seems that for some reason it started triggering download of tika-server!
I've added JRE env to Docker, but I cannot get tika-server to properly start and log messages are scarce:
/tmp # cat tika-server.log Jan 19, 2021 2:50:35 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.
Jan 19, 2021 2:50:35 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.24 server INFO Setting the server's publish address to be http://localhost:9998/ INFO Logging initialized @2072ms to org.eclipse.jetty.util.log.Slf4jLog INFO jetty-9.4.24.v20191120; built: 2019-11-20T21:37:49.771Z; git: 363d5f2df3a8a28de40604320230664b9c793c16; jvm 1.8.0_252-b09 INFO Started ServerConnector@4ddbbdf8{HTTP/1.1,[http/1.1]}{localhost:9998} INFO Started @2238ms WARN Empty contextPath INFO Started o.e.j.s.h.ContextHandler@5f354bcf{/,null,AVAILABLE} INFO Started Apache Tika server at http://localhost:9998/ INFO JVM Runtime does not support Modules INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type)
/tmp # cat tika.log 2021-01-19 14:50:28,581 [Thread-10 ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar to /tmp/tika-server.jar. 2021-01-19 14:50:33,641 [Thread-10 ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to /tmp/tika-server.jar.md5. 2021-01-19 14:50:34,700 [Thread-10 ] [WARNI] Failed to see startup log message; retrying...
What am I doing wrong? Ideally I would prefer server to get all dependencies during building of container, not in runtime.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pangaea-data-publisher/fuji/issues/125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACW5R4C74FS2HGMM77J5ODS2WOITANCNFSM4WI5IXSA .
-- Dr. Robert Huber,
PANGAEA - www.pangaea.de
MARUM - Center for Marine Environmental Sciences University Bremen Leobener Strasse POB 330 440 28359 Bremen Phone ++49 421 218-65593, Fax ++49 421 218-65505 e-mail rhuber@uni-bremen.de
2021-01-19 14:50:28,581 [Thread-10 ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar to /tmp/tika-server.jar. 2021-01-19 14:50:33,641 [Thread-10 ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to /tmp/tika-server.jar.md5.
Seems that this file is downloaded on the background -- should I simply set TIKA_SERVER_JAR then to point to that location? I will try out.
Any news on this? Could you solve the issue by setting the environment variable?
Sorry, forgot to update - nope, still the same issue. That variable is an option if external TIKA server location is used, but actually the issue is that it doesn't launch (any more?) for some reason.
The Dockerfile I'm trying with is below. And it used to work also openjdk8.
FROM python:3.8-alpine
RUN apk add --update \
g++ \
gcc \
libffi-dev \
openssl-dev \
python3-dev \
libxslt-dev \
libc-dev \
libxml2-dev \
build-base \
openjdk8-jre \
&& rm -rf /var/cache/apk/*
# set the working directory in the container
WORKDIR /code
# copy the dependencies file to the working directory
COPY requirements.txt .
# install dependencies
RUN pip install -r requirements.txt
# copy the content of the local src directory to the working directory
COPY fuji_server ./fuji_server
EXPOSE 1071
# command to run on container start
CMD [ "python3", "-m", "fuji_server", "-c", "fuji_server/config/server.ini" ]
Ok, it might have been a wrong symptom. I noticed that 500 is returned also for the incorrect? user input.
When querying via swagger, I also get 500 -- but with a better error message:
{
"detail": "True is not of type 'string'\n\nFailed validating 'type' in schema['properties']['request']['additionalProperties']:\n {'type': 'string'}\n\nOn instance['request']['use_datacite']:\n True",
"status": 500,
"title": "Response body does not conform to specification",
"type": "about:blank"
}
Does it look familiar?
Input:
curl -X POST "https://fair.etais.ee/fuji/api/v1/evaluate" -H "accept: application/json" -H "Authorization: Basic XXX" -H "Content-Type: application/json" -d "{\"oaipmh_endpoint\":\"\",\"object_identifier\":\"https://doi.org/10.1594/PANGAEA.908011\",\"test_debug\":true,\"use_datacite\":true}"
Ok, seems that the issue was fixed with the latest commits, upgrding to latest got if fixed.
Hi,
after the last upgrade to the latest master, I started seeing errors when running in pure python docker container:
Seems that for some reason it started triggering download of tika-server!
I've added JRE env to Docker, but I cannot get tika-server to properly start and log messages are scarce:
What am I doing wrong? Ideally I would prefer server to get all dependencies during building of container, not in runtime.