tracek / gee_asset_manager

Google Earth Engine Batch Asset Manager
Apache License 2.0
74 stars 35 forks source link

Google Earth Engine Batch Asset Manager

Google Earth Engine Batch Asset Manager ambition is helping user with batch actions on assets. It will be developed on use case basis, so if there's something missing feel free to post a feature request in Issues tab.

Table of contents

Installation

We assume Earth Engine Python API is installed and EE authorised as desribed here. To install:

git clone https://github.com/tracek/gee_asset_manager
cd gee_asset_manager && pip install -e .

Installation is an optional step; the application can be also run directly by executing geebam.py script. The advantage of having it installed is being able to execute geebam as any command line tool. I recommend installation within virtual environment.

Getting started

As usual, to print help:

geebam -h

positional arguments:
  {delete,upload,cancel,report}
    delete              Deletes collection and all items inside. Supports
                        Unix-like wildcards.
    upload              Batch Asset Uploader.
    cancel              Cancel all running tasks
    report              Produce summary of all assets.

optional arguments:
  -h, --help            show this help message and exit
  -s SERVICE_ACCOUNT, --service-account SERVICE_ACCOUNT
                        Google Earth Engine service account.
  -k PRIVATE_KEY, --private-key PRIVATE_KEY
                        Google Earth Engine private key file.

To obtain help for a specific functionality, simply call it with help switch, e.g.: geebam upload -h. If you didn't install geebam, then you can run it just by going to geebam directory and running python geebam.py [arguments go here]

Batch uploader

The script creates an Image Collection from GeoTIFFs in your local directory. By default, the collection name is the same as the local directory name; with optional parameter you can provide a different name. Another optional parameter is a path to a CSV file with metadata for images, which is covered in the next section: Parsing metadata.

geebam upload -h

usage: geebam upload [-h] --source SOURCE --dest DEST [-m METADATA] [--large]
                     [--nodata NODATA] [-u USER] [-s SERVICE_ACCOUNT]
                     [-k PRIVATE_KEY] [-b BUCKET]

optional arguments:
  -h, --help            show this help message and exit

Required named arguments.:
  --source SOURCE       Path to the directory with images for upload.
  --dest DEST           Destination. Full path for upload to Google Earth
                        Engine, e.g. users/pinkiepie/myponycollection
  -u USER, --user USER  Google account name (gmail address).

Optional named arguments:
  -m METADATA, --metadata METADATA
                        Path to CSV with metadata.
  --large               (Advanced) Use multipart upload. Might help if upload
                        of large files is failing on some systems. Might cause
                        other issues.
  --nodata NODATA       The value to burn into the raster as NoData (missing
                        data)
  -b BUCKET, --bucket BUCKET
                        Google Cloud Storage bucket name

Parsing metadata

By metadata we understand here the properties associated with each image. Thanks to these, GEE user can easily filter collection based on specified criteria. The file with metadata should be organised as follows:

filename (without extension) property1 header property2 header
file1 value1 value2
file2 value3 value4

Note that header can contain only letters, digits and underscores.

Example:

id_no class category binomial system:time_start
my_file_1 GASTROPODA EN Aaadonta constricta 1478943081000
my_file_2 GASTROPODA CR Aaadonta irregularis 1478943081000

The corresponding files are my_file_1.tif and my_file_2.tif. With each of the files five properties are associated: id_no, class, category, binomial and system:time_start. The latter is time in Unix epoch format, in milliseconds, as documented in GEE glosary. The program will match the file names from the upload directory with ones provided in the CSV and pass the metadata in JSON format:

{ id_no: my_file_1, class: GASTROPODA, category: EN, binomial: Aaadonta constricta, system:time_start: 1478943081000}

The program will report any illegal fields, it will also complain if not all of the images passed for upload have metadata associated. User can opt to ignore it, in which case some assets will have no properties.

Having metadata helps in organising your asstets, but is not mandatory - you can skip it.

Usage examples

Delete a collection with content:

The delete is recursive, meaning it will delete also all children assets: images, collections and folders. Use with caution!

geebam delete users/pinkiepie/test

Console output:

2016-07-17 16:14:09,212 :: oauth2client.client :: INFO :: Attempting refresh to obtain initial access_token
2016-07-17 16:14:09,213 :: oauth2client.client :: INFO :: Refreshing access_token
2016-07-17 16:14:10,842 :: root :: INFO :: Attempting to delete collection test
2016-07-17 16:14:16,898 :: root :: INFO :: Collection users/pinkiepie/test removed

Delete all directories / collections based on a Unix-like pattern

geebam delete users/pinkiepie/*weird[0-9]?name*

Upload a directory with images to your myfolder/mycollection and associate properties with each image:

geebam upload -u pinkiepie@gmail.com --source path_to_directory_with_tif -m path_to_metadata.csv --dest users/pinkiepie/myfolder/myponycollection

The script will prompt the user for Google account password. The program will also check that all properties in path_to_metadata.csv do not contain any illegal characters for GEE. Don't need metadata? Simply skip this option.

Upload a directory with images with specific NoData value to a selected destination

geebam upload -u pinkiepie@gmail.com --source path_to_directory_with_tif --dest users/pinkiepie/myfolder/myponycollection --nodata 222

In this case we need to supply full path to the destination, which is helpful when we upload to a shared folder. In the provided example we also burn value 222 into all rasters for missing data (NoData).