Closed aazaff closed 3 years ago
Once @iross approves this example, will replace the README.md with the above inforamtion and then Laura will redirect the help button to the README.
Looks good to me, content wise. Some of the external links don't work (example: Elasticsearch, frailty app) and the internal links don't actually link to the sections they refer to, but that might be a side effect of being an issue instead of a README proper.
@iross, yeah I fixed a bunch of broken linkes. Once I get a final link for enriques app, we will push this.
@Lbookman Okay, htis is ready to go. Not sure how you want to implement it.
Moved to its own .md file in 950b725d1b35c920ffea1d6a6f56a3df0dca7f5c.
Frequently Asked Questions
Table of Contents
General Questions
What is xDD?
The xDD library of scientific articles is the largest set of documents available for legal automated knowledge base construction in the world, hosting >14 million documents from across ~13 thousand journals (growing by ~8,000 new documents per day). For comparison, the PubMed Central bulk data mining library contains 2.75 million minable full-text articles. While smaller datasets are adequate for proof-of-concept research meant to test the efficacy of different machine learning algorithms and techniques, attempts to build a scientifically-actionable dataset for applied research require a larger dataset with more comprehensive coverage of the scientific literature.
What is Knowledge Base Construction?
Knowledge base construction (KBC) is the process of populating a knowledge base (i.e., a database) with facts extracted from data (e.g., text, audio, video, tables, diagrams, ...). For example, a scientist wishing to study the distribution of stromatolites in the fossil record would need to compile a database of where stromatolite fossils have been reported around the world. The traditional way to build such a database would be the for the scientist to read through various scientific publications to find descriptions of stromatolite fossils and manually enter relevant findings into a database.
Automated knowledge base construction is simply when a software application is used to achieve the same result programmatically. xDD is specifically designed to provide machine-readable copies of scientific publications for such automated knowledge base construction.
What is ADEPT?
ADEPT (the Automated Data Extraction PlaTform) is a front-end, web-interface for interacting with data in the xDD library. It allows users to 1) browse available documents in the xDD library using full-text search terms (powered by ElasticSearch and other common search parameters (e.g., publication date, journal name); 2) save and track sets of documents associated with particular research projects; 3) and submit and deploy KBC applications against the xDD library.
What is COSMOS?
COSMOS is an AI-powered technical assistant that extracts and assimilates data to algorithmically identify and extract tables, figures, and equations. COSMOS is independent of both the xDD and ADEPT systems and its inclusion in a KBC application is completely optional. Most xDD applications do not use it at all. However, users that are specifically interested in analysing tables, figures, or equations as part of their knowledge base construction applications may find it useful to integrate COSMOS into their workflow. For more information about COSMOS please visit https://cosmos.wisc.edu.
Who runs these projects?
The xDD-ADEPT-COSMOS system is currently maintained by a joint partnership between the Department of Geoscience and Department of Computer Sciences at the University of Wisconsin-Madison. Requests to collaborate or other questions should be directed to the project leader at these organisations, Shanan E. Peters (contact@geodeepdive.org).
Other organisations have also contributed to various parts of the xDD-ADEPT-COSMOS system at various points in its development history: including the Stanford AI Lab, the Geoinformatics Research Lab at the Arizona Geological Survey, the Wisconsin Institute for Discovery, and the Center for High-throughput Computing.
Who pays for these projects?
Project funding is currently or has been provided by the National Science Foundation, the U.S. Department of Energy, and the Defense Advanced Research Project Agency.
Importantly, users do not and will never have to pay to use xDD or the ADEPT system. Access to high-throughput compute resources as part of the ADEPT system are also provided to users for free.
Is this resource right for my project?
xDD is much larger than comparable data stacks because it has negotiated unique contractual agreements with the four largest publishers of scientific literature (Reed-Elsevier, Wiley-Blackwell, Springer, and Taylor & Francis) to allow free, bulk text-data mining on documents where this is normally prohibited. These controlled-access documents are in addition to xDD’s considerable open-source holdings such as documents from the United States Geological Survey.
The trade-off of including these controlled-access agreements is that xDD and its users must obey the following three constraints.
A full list of the Terms of Service for ADEPT and xDD can be viewed here. As a general rule, however, projects that meet the above three criteria are likely permissable. Users with further questions about permissability are encouraged to reach out to xDD administrators directly (contact@geodeepdive.org).
Hypothetical use-cases?
The following is a list of hypothetical scenarios where xDD would or would not be an appropriate partner for a research project. This list is not exhaustive, and a full list of the Terms of Service for ADEPT and xDD can be viewed here. Users with further questions about permissability are encouraged to reach out to xDD administrators directly (contact@geodeepdive.org).
Prohibited Example 1
A user wishes to read a particular article and submits an xDD application that returns a human-readable, full-text copy of that article without performing any data analysis. (Violation of the derived-products only rule).
Prohibited Example 2
A user write a knowledge base construction application to create a database of the locations of precious minerals and ores with the intention to sell this database to mining companies. (Violation of the non-commercial only rule).
Allowed Example 1
A user write a knowledge base construction application to create a database of the locations of precious minerals and ores and publishes this data freely under an open-access license as part of an academic publication. A mining company, unaffiliated with the original KBC researcher, then uses that data for commercial purposes.
How to get started with xDD and ADEPT
Users interested in developing an xDD application are encouraged to visit the following resources. Users are also encouraged to reach out directly to the xDD administrators with questions (contact@geodeepdive.org).