This application can be used to search for satellite or model data from various providers and ingest metadata into a GeoSPaaS database. It relies on Django for data access. Specifically, it uses the models defined in django-geo-spaas.
This readme explains the basic usage of this package. Documentation aimed at developers can be found here.
The main interface is the CLI. A Web interface may be implemented in the future.
The CLI can be accessed through the geospaas_harvesting.cli
module. If no option is given, it will
use the default configuration file.
Example:
python -m geospaas_harvesting.cli harvest
Path to a custom configuration file can be specified. See this section for more details. If not provided, the default configuration file is used.
Example:
python -m geospaas_harvesting.cli -c ./config.yml harvest
Prints the help message
The harvest
subcommand runs searches based on the search.yml
file (example
here) and ingests the results in the database.
A path to a search configuration file. See this section for more details.
python -m geospaas_harvesting.cli -c ./config.yml harvest -s ./search.yml
Display a list of the available providers and their search parameters.
python -m geospaas_harvesting.cli -c ./config.yml list
Not implemented yet.
Before harvesting data, the database must be initialized with Vocabulary
objects.
The update can be done automatically and is controlled by the update_vocabularies
,
update_pythesint
and pythesint_versions
in the configuration file.
If you don't know what this means, it is best to keep the default values.
All configuration files are in YAML. The !ENV
tag allows to use environment variables as values.
config.yml
The configuration of the harvesters is defined in this file. An example can be seen in the default configuration file.
Top-level keys:
pythesint
data. If update_pythesint is also set to True, the local data is
refreshed before the database is updated.True
will have no effect if update_vocabularies is set
to False
.The properties which are common to every harvester are:
The rest depends on the harvester and will be detailed in each provider's documentation.
This file is used to set the search parametersfor each provider you wish to use.
By default, the CLI looks for a file called search.yml
in the folder from which the search/harvest
command is run.
It contains two sections:
provider_name
key defined.The list
subcommand can be used to find out which search parameters each provider supports.
The search parameters can have the following types:
Some providers define specific parameters types as needed.
These search parameters can be used for every provider:
---
common: # these are common to all searches
start_time: '2022-07-13'
end_time: '2022-07-14'
location: 'POLYGON ((-43.2346 59.8972, -37.1701 62.2756, -31.8527 64.3661, -25.8762 65.8635, -20.7126 68.37690000000001, -19.9435 69.3939, -22.756 70.0712, -26.6232 68.8853, -32.2922 68.25920000000001, -36.6867 66.7291, -41.1252 65.0235, -42.6633 62.8226, -43.2346 59.8972))'
searches:
- provider_name: 'creodias'
collection: 'Sentinel1'
processingLevel: 'LEVEL1'
productType: 'GRD'
- provider_name: 'earthdata_cmr'
short_name: 'VIIRSJ1_L2_OC_NRT'
start_time: '2018-12-01T00:00:00Z'
end_time: '2018-12-04T12:00:00Z'
Generic configuration can be defined using environment variables:
GEOSPAAS_HARVESTING_LOG_CONF_PATH
: path to the logging configuration fileGEOSPAAS_FAILED_INGESTIONS_DIR
: path to the directory where information about datasets for which errors occurred is storedSECRET_KEY
: Django secret keyGEOSPAAS_DB_HOST
: database hostnameGEOSPAAS_DB_PORT
: database portGEOSPAAS_DB_NAME
: database nameGEOSPAAS_DB_USER
: database usernameGEOSPAAS_DB_PASSWORD
: database passwordOther environment variables can be defined in the configuration files by using the !ENV
tag.