terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
24 stars 13 forks source link

Reprocess stereoTop data for 2016-2017 #357

Closed craig-willis closed 6 years ago

craig-willis commented 7 years ago

Re-run full stereoTop pipeline extractors on all data though September 20th, 2017. This includes: bin2tif, fieldmosaic, and canopycover extractors.

Re-processing steps:

max-zilla commented 7 years ago

@robkooper FYI To simplify this pipeline I am introducing a small new extractor: https://github.com/terraref/extractors-metadata/tree/master/cleaner terraref/ext-metadata-cleaner

This extractor binds to nothing - that is, it does not trigger automatically on any kind of file or dataset. It can only be triggered manually.

When triggered on a dataset, it will (optionally) delete all metadata currently attached to the dataset, then use terrautils + dataset name to determine where on ROGER the raw original metadata.json file is located. Once found it runs that file through our latest terrautils cleaning code and optionally triggers a callback extractor if given.

With this setup, the process becomes:

I am running final tests of our new code we added this week but will start this process today. This will require a lot of care as there is more at stake than the EnvironmentLogger processing.

max-zilla commented 7 years ago

Basically here's the sketch I came up with last night:

screen shot 2017-10-04 at 12 36 34 pm
max-zilla commented 6 years ago

Continued in #393