The purpose of this refactoring is to make it easier for people to use this repository and contribute to it.
This is (hopefully) done through changes to the organization of the code and to the interface.
Code organization: the purpose here was to gather the code specific to a provider in a single place.
It should make it easier to add support for new data providers and to maintain the existing ones.
Here are the changes made for that purpose:
metadata normalization has been moved to the crawlers. The sole purpose of the ingester is now to write to the database.
harvesters have been removed. Providers objects have been created. Each provider class offers an interface to a specific data provider and allows to run searches in this provider's data. It's the provider's job to instantiate the right crawler.
providers are located in separate module files. If there is a need for a crawler which is specific to one provider, it is located in the same file. That way, all the code which is specific to one data provider is in the same place.
Interface:
dedicated CLI module
one configuration file for general configuration (including providers), that users most likely won't have to touch
one configuration file for searches.
the CLI has a sub-command which lists the available providers and their possible configuration and search parameters.
Various improvements:
have a more object-oriented approach (e.g. pass around objects instead of dictionary with implicit structure)
centralized arguments management making it easy to generate documentation/help messages.
added Search objects to navigate crawlers' results. Will be useful for a future web interface.
Depends on #126
The purpose of this refactoring is to make it easier for people to use this repository and contribute to it. This is (hopefully) done through changes to the organization of the code and to the interface.
Code organization: the purpose here was to gather the code specific to a provider in a single place. It should make it easier to add support for new data providers and to maintain the existing ones. Here are the changes made for that purpose:
Interface:
Various improvements: