openaire / iis

Information Inference Service of the OpenAIRE system
Apache License 2.0
20 stars 11 forks source link
big-data data-mining data-processing-system hadoop iis information-inference openaire spark text-mining

About

Information Inference Service (IIS) a flexible data processing system for handling big data based on Apache Hadoop technologies. It is a subsystem of the OpenAIRE system (www.openaire.eu is its public web front-end) - see Fig.1 for a high-level overview.

Fig.1: The center of OpenAIRE system is the Information Space system which stores all information available in the system. IIS ingests data from Information Space, runs processing workflows, and produces inferred data which, in turn, is ingested by Information Space.

The goal of OpenAIRE is to provide an infrastructure for gathering, processing (including de-duplication), and providing unified access to research-related data (papers, datasets, researchers, projects, etc.). The goal of IIS is to provide data/text mining functionality for the OpenAIRE system. In practice, IIS defines data processing workflows that connect various modules, each one with well-defined input and output. A high-level overview of IIS can be found in paper "Information Inference in Scholarly Communication Infrastructures: The OpenAIREplus Project Experience", Procedia Computer Science, vol. 38, 2014, 92-99.

IIS was initially developed during OpenAIREplus project and has been further extended during OpenAIRE2020 project.

The original code was migrated to GitHub from D-NET SVN repository. The public read-only interface of the repository is available at https://svn-public.driver.research-infrastructures.eu/driver/dnet40/modules/ and this is where you can find the history of the code base before the migration (IIS-related Maven projects are the ones matching glob pattern *-iis-*).

Content of the most important subdirectories and files

License

The code is licensed under Apache License, version 2.0. We also use 3rd party code from other projects compatible with this license. This 3rd party code can be found in directories with names starting with iis-3rdparty-; each directory corresponds to a different source project.