Builds a knowledge graph from the COVID-19 Open Research Dataset (CORD-19) dataset. As of 2020-03-18 it has been run against the Commercial use subset (includes PMC content) -- 9000 papers, 186Mb.
This project is written is Scala... you require sbt to continue.
git clone https://github.com/dair-iitd/OpenIE-standalone.git && cd OpenIE-standalone
sbt -J-Xmx10000M clean compile assembly
java -Xmx10g -XX:+UseConcMarkSweepGC -jar target/scala-2.10/openie-assembly-5.0-SNAPSHOT.jar --httpPort 8000
/getExtraction
endpoint to POST sentences. The sentence will go in the body of HTTP request. An example of curl request curl -X POST http://localhost:8000/getExtraction -d "The Jet Propulsion Laboratory is a federally funded research and development center and NASA field center in the city of La Canada Flintridge with a Pasadena mailing address, within the state of California, United States."
Back in this directory...
Launch sbt:
$ sbt compile
Launch sbt:
$ sbt
Run the program with an argument indicating the input data
directory containing the dataset:
> run /path/to/directory/containing/individual/CORD-19_files /path/to/directory/containing/individual/annie_extra ction_files
First assemble
the JAR
$ sbt assembly
... then run jar via java
$ java -jar ./target/scala-2.13/covid19_knowledge_graph-assembly-0.1.0-SNAPSHOT.jar
Once the program runs (this may take some time depending on how much memory your machine has) you will find a newly written file called covid19_knowledge_graph.ttl
. This file can be loaded into Apache Jena's Fuseki server (or any other SPARQL server which permits ingest of TTL RDF graphs).
Once the data is loaded into Fuseki, you can use Jena's powerful full text search which combines SPARQL and full text search via Lucene or ElasticSearch (built on Lucene). It gives applications the ability to perform indexed full text searches within SPARQL queries.
Dr. Lewis John McGibbney Ph.D., B.Sc.(Hons)
Enterprise Search Technologist
Web and Mobile Application Development Group (172B)
Application, Consulting, Development and Engineering Section (1722)
Info & Engineering Technology Planning and Development Division (1720)
Jet Propulsion Laboratory
California Institute of Technology
4800 Oak Grove Drive
Pasadena, California 91109-8099
Mail Stop : 600-172A
Tel: (+1) (818)-393-7402
Cell: (+1) (626)-487-3476
Fax: (+1) (818)-393-1190
Email: lewis.j.mcgibbney@jpl.nasa.gov
ORCID: orcid.org/0000-0003-2185-928X