Closed JoeUser47 closed 4 years ago
Getting a couple warnings in the tika.log, nothing really in the worker or web. Worker appears to show tasks started & done for things I have keys for. I also notice the vm isn't using the amount of memory is usually does.
This box is running on hyper-v. I'll spin up on digital ocean and see if there's a difference.
tika: Apr 29, 2020 9:18:44 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies.
Apr 29, 2020 9:18:45 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.24 server INFO Setting the server's publish address to be http://localhost:9998/ INFO Logging initialized @23074ms to org.eclipse.jetty.util.log.Slf4jLog INFO jetty-9.4.24.v20191120; built: 2019-11-20T21:37:49.771Z; git: 363d5f2df3a8a28de40604320230664b9c793c16; jvm 11.0.6+10-post-Ubuntu-1ubuntu118.04.1 INFO Started ServerConnector@5049d8b2{HTTP/1.1,[http/1.1]}{localhost:9998} INFO Started @23886ms WARN Empty contextPath INFO Started o.e.j.s.h.ContextHandler@6fefce9e{/,null,AVAILABLE}
tika is just one of the supporting services (Apache Tika) that performs document metadata analysis. We'd need the log/worker* logs to diagnose any specific system problems.
This looks to be environment dependent, let's coordinate in the slack support channel for now. I'll leave it open until we've resolved the issue there.
Unable to duplicate this on a bare metal box, closing issue.
Adding domain in latest pull doesn't populate domain entity info. dmarc is null while all other records such as mx_records, soa_record, nameservers etc. are blank.
Fresh pull after stopping all, removing all and pruning all.
Possibly related, no additional entities or info is populated under http://localhost:7777/$DOMAIN/analysis.
/results pages does show searches on certspotter, dns_search_sonar, and other tasks complete, but all have entity count of 0 except for the domain creation entity with 1.