nansencenter / django-geo-spaas-harvesting

Harvest data into a GeoSPaaS catalog
GNU General Public License v3.0
1 stars 1 forks source link
geospaas

Build Status Coverage Status

Data gathering for GeoSPaaS

This application can be used to search for satellite or model data from various providers and ingest metadata into a GeoSPaaS database. It relies on Django for data access. Specifically, it uses the models defined in django-geo-spaas.

This readme explains the basic usage of this package. Documentation aimed at developers can be found here.

Interfaces

The main interface is the CLI. A Web interface may be implemented in the future.

Command line

The CLI can be accessed through the geospaas_harvesting.cli module. If no option is given, it will use the default configuration file.

Example:

python -m geospaas_harvesting.cli harvest

Base options

-c, --config

Path to a custom configuration file can be specified. See this section for more details. If not provided, the default configuration file is used.

Example:

python -m geospaas_harvesting.cli -c ./config.yml harvest
-h, --help

Prints the help message

Subcommands

harvest

The harvest subcommand runs searches based on the search.yml file (example here) and ingests the results in the database.

-s, --search

A path to a search configuration file. See this section for more details.

python -m geospaas_harvesting.cli -c ./config.yml harvest -s ./search.yml
list

Display a list of the available providers and their search parameters.

python -m geospaas_harvesting.cli -c ./config.yml list

Web interface

Not implemented yet.

Warning before starting the harvesting process

Before harvesting data, the database must be initialized with Vocabulary objects. The update can be done automatically and is controlled by the update_vocabularies, update_pythesint and pythesint_versions in the configuration file. If you don't know what this means, it is best to keep the default values.

Configuration

Files

All configuration files are in YAML. The !ENV tag allows to use environment variables as values.

config.yml

The configuration of the harvesters is defined in this file. An example can be seen in the default configuration file.

Top-level keys:

Providers configuration

The properties which are common to every harvester are:

The rest depends on the harvester and will be detailed in each provider's documentation.

Search configuration

This file is used to set the search parametersfor each provider you wish to use. By default, the CLI looks for a file called search.yml in the folder from which the search/harvest command is run.

It contains two sections:

The list subcommand can be used to find out which search parameters each provider supports. The search parameters can have the following types:

Some providers define specific parameters types as needed.

Common parameters

These search parameters can be used for every provider:

Example
---
common: # these are common to all searches
  start_time: '2022-07-13'
  end_time: '2022-07-14'
  location: 'POLYGON ((-43.2346 59.8972, -37.1701 62.2756, -31.8527 64.3661, -25.8762 65.8635, -20.7126 68.37690000000001, -19.9435 69.3939, -22.756 70.0712, -26.6232 68.8853, -32.2922 68.25920000000001, -36.6867 66.7291, -41.1252 65.0235, -42.6633 62.8226, -43.2346 59.8972))'
searches:
- provider_name: 'creodias'
  collection: 'Sentinel1'
  processingLevel: 'LEVEL1'
  productType: 'GRD'

- provider_name: 'earthdata_cmr'
  short_name: 'VIIRSJ1_L2_OC_NRT'
  start_time: '2018-12-01T00:00:00Z'
  end_time: '2018-12-04T12:00:00Z'

Environment variables

Generic configuration can be defined using environment variables:

Other environment variables can be defined in the configuration files by using the !ENV tag.