nasa-jpl-cord-19 / covid19-knowledge-graph

Builds a knowledge graph from the [COVID-19 Open Research Dataset (CORD-19)](https://pages.semanticscholar.org/coronavirus-research) dataset.
Apache License 2.0
16 stars 3 forks source link
covid covid-19 covid-2019 covid-virus covid19 covid19-data jena knowledge-graph knowledge-representation knowledgebase openie

COVID-19 Research Knowledge Graph

Builds a knowledge graph from the COVID-19 Open Research Dataset (CORD-19) dataset. As of 2020-03-18 it has been run against the Commercial use subset (includes PMC content) -- 9000 papers, 186Mb.

This project is written is Scala... you require sbt to continue.

Prerequsites

Installation

Back in this directory...

Launch sbt:

$ sbt compile

Running

From sbt

Launch sbt:

$ sbt

Run the program with an argument indicating the input data directory containing the dataset:

> run /path/to/directory/containing/individual/CORD-19_files /path/to/directory/containing/individual/annie_extra ction_files

As a standalone JAR

First assemble the JAR

$ sbt assembly

... then run jar via java

$ java -jar ./target/scala-2.13/covid19_knowledge_graph-assembly-0.1.0-SNAPSHOT.jar

Output

Once the program runs (this may take some time depending on how much memory your machine has) you will find a newly written file called covid19_knowledge_graph.ttl. This file can be loaded into Apache Jena's Fuseki server (or any other SPARQL server which permits ingest of TTL RDF graphs).

Querying Data

Once the data is loaded into Fuseki, you can use Jena's powerful full text search which combines SPARQL and full text search via Lucene or ElasticSearch (built on Lucene). It gives applications the ability to perform indexed full text searches within SPARQL queries.

Contact

Dr. Lewis John McGibbney Ph.D., B.Sc.(Hons)

Enterprise Search Technologist

Web and Mobile Application Development Group (172B)

Application, Consulting, Development and Engineering Section (1722)

Info & Engineering Technology Planning and Development Division (1720)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 600-172A

Tel: (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax: (+1) (818)-393-1190

Email: lewis.j.mcgibbney@jpl.nasa.gov

ORCID: orcid.org/0000-0003-2185-928X