aazaff commented 3 years ago

Frequently Asked Questions

General Questions

What is xDD?

The xDD library of scientific articles is the largest set of documents available for legal automated knowledge base construction in the world, hosting >14 million documents from across ~13 thousand journals (growing by ~8,000 new documents per day). For comparison, the PubMed Central bulk data mining library contains 2.75 million minable full-text articles. While smaller datasets are adequate for proof-of-concept research meant to test the efficacy of different machine learning algorithms and techniques, attempts to build a scientifically-actionable dataset for applied research require a larger dataset with more comprehensive coverage of the scientific literature.

What is Knowledge Base Construction?

Knowledge base construction (KBC) is the process of populating a knowledge base (i.e., a database) with facts extracted from data (e.g., text, audio, video, tables, diagrams, ...). For example, a scientist wishing to study the distribution of stromatolites in the fossil record would need to compile a database of where stromatolite fossils have been reported around the world. The traditional way to build such a database would be the for the scientist to read through various scientific publications to find descriptions of stromatolite fossils and manually enter relevant findings into a database.

Automated knowledge base construction is simply when a software application is used to achieve the same result programmatically. xDD is specifically designed to provide machine-readable copies of scientific publications for such automated knowledge base construction.

Screenshot 2021-08-05 at 11 20 39 AM

What is ADEPT?

ADEPT (the Automated Data Extraction PlaTform) is a front-end, web-interface for interacting with data in the xDD library. It allows users to 1) browse available documents in the xDD library using full-text search terms (powered by ElasticSearch and other common search parameters (e.g., publication date, journal name); 2) save and track sets of documents associated with particular research projects; 3) and submit and deploy KBC applications against the xDD library.

What is COSMOS?

COSMOS is an AI-powered technical assistant that extracts and assimilates data to algorithmically identify and extract tables, figures, and equations. COSMOS is independent of both the xDD and ADEPT systems and its inclusion in a KBC application is completely optional. Most xDD applications do not use it at all. However, users that are specifically interested in analysing tables, figures, or equations as part of their knowledge base construction applications may find it useful to integrate COSMOS into their workflow. For more information about COSMOS please visit https://cosmos.wisc.edu.

Who runs these projects?

The xDD-ADEPT-COSMOS system is currently maintained by a joint partnership between the Department of Geoscience and Department of Computer Sciences at the University of Wisconsin-Madison. Requests to collaborate or other questions should be directed to the project leader at these organisations, Shanan E. Peters (contact@geodeepdive.org).

Other organisations have also contributed to various parts of the xDD-ADEPT-COSMOS system at various points in its development history: including the Stanford AI Lab, the Geoinformatics Research Lab at the Arizona Geological Survey, the Wisconsin Institute for Discovery, and the Center for High-throughput Computing.

Who pays for these projects?

Project funding is currently or has been provided by the National Science Foundation, the U.S. Department of Energy, and the Defense Advanced Research Project Agency.

Importantly, users do not and will never have to pay to use xDD or the ADEPT system. Access to high-throughput compute resources as part of the ADEPT system are also provided to users for free.

Is this resource right for my project?

xDD is much larger than comparable data stacks because it has negotiated unique contractual agreements with the four largest publishers of scientific literature (Reed-Elsevier, Wiley-Blackwell, Springer, and Taylor & Francis) to allow free, bulk text-data mining on documents where this is normally prohibited. These controlled-access documents are in addition to xDD’s considerable open-source holdings such as documents from the United States Geological Survey.

The trade-off of including these controlled-access agreements is that xDD and its users must obey the following three constraints.

KBC applications using xDD documents may only be deployed within UW-Madison computing resources. xDD provides access to these computing resources to users for free.
Users must use xDD for non-commercial academic research and may not monetize the output of an xDD-KBC application.
Application output must be a machine-readable, derived product..

A full list of the Terms of Service for ADEPT and xDD can be viewed here. As a general rule, however, projects that meet the above three criteria are likely permissable. Users with further questions about permissability are encouraged to reach out to xDD administrators directly (contact@geodeepdive.org).

Hypothetical use-cases?

The following is a list of hypothetical scenarios where xDD would or would not be an appropriate partner for a research project. This list is not exhaustive, and a full list of the Terms of Service for ADEPT and xDD can be viewed here. Users with further questions about permissability are encouraged to reach out to xDD administrators directly (contact@geodeepdive.org).

Prohibited Example 1

A user wishes to read a particular article and submits an xDD application that returns a human-readable, full-text copy of that article without performing any data analysis. (Violation of the derived-products only rule).

Prohibited Example 2

A user write a knowledge base construction application to create a database of the locations of precious minerals and ores with the intention to sell this database to mining companies. (Violation of the non-commercial only rule).

Allowed Example 1

A user write a knowledge base construction application to create a database of the locations of precious minerals and ores and publishes this data freely under an open-access license as part of an academic publication. A mining company, unaffiliated with the original KBC researcher, then uses that data for commercial purposes.

How to get started with xDD and ADEPT

Users interested in developing an xDD application are encouraged to visit the following resources. Users are also encouraged to reach out directly to the xDD administrators with questions (contact@geodeepdive.org).

Review the xDD Terms of Service
Instructions on how to construct an xDD Application
The [tutorial video]() on how to use the ADEPT front-end for managing your xDD project.
Three examples of working applications created by xDD users.

aazaff commented 3 years ago

Once @iross approves this example, will replace the README.md with the above inforamtion and then Laura will redirect the help button to the README.

iross commented 3 years ago

Looks good to me, content wise. Some of the external links don't work (example: Elasticsearch, frailty app) and the internal links don't actually link to the sections they refer to, but that might be a side effect of being an issue instead of a README proper.

aazaff commented 3 years ago

@iross, yeah I fixed a bunch of broken linkes. Once I get a final link for enriques app, we will push this.

aazaff commented 3 years ago

@Lbookman Okay, htis is ready to go. Not sure how you want to implement it.

aazaff commented 3 years ago

Moved to its own .md file in 950b725d1b35c920ffea1d6a6f56a3df0dca7f5c.

ngds / ADEPT_frontend

Draft of frequently asked questions page #19

Frequently Asked Questions

Table of Contents

General Questions

What is xDD?

What is Knowledge Base Construction?

What is ADEPT?

What is COSMOS?

Who runs these projects?

Who pays for these projects?

Is this resource right for my project?

Hypothetical use-cases?

Prohibited Example 1

Prohibited Example 2

Allowed Example 1

How to get started with xDD and ADEPT