Idea is to track how much time the pipeline took to process a single file to ingest in the earth engine.
How we can do this is by adding a couple of timestamps in the asset (image) attributes.
_job_starttime: When the pipeline starts to process the source file in ConvertToCog transform, add the timestamp as this attribute. (It'd be slightly delayed because it would go through filter transform, but I think it's not a big deal.)
_ingestiontime: When the pipeline is about to ingest the asset in the earth engine, in IngestIntoEETransform transform, add the timestamp as this attribute.
Benefit of these attributes is that, we can track down how much time the pipeline took to process a single file. It is an important stat especially for a real-time pipeline.
To the one who will use this metrics in the future...
How to fetch these attributes after the assets are ingested in the earth engine?
job_start_array = ic.aggregate_array('job_start_time').getInfo()
ingestion_time_array = ic.aggregate_array('ingestion_time').getInfo()
init_time_array = [t/1000. for t in hrrr.aggregate_array('system:time_start').getInfo()]
valid_time_array = [t/1000. for t in hrrr.aggregate_array('system:time_end').getInfo()]
Idea is to track how much time the pipeline took to process a single file to ingest in the earth engine.
How we can do this is by adding a couple of timestamps in the asset (image) attributes.
ConvertToCog
transform, add the timestamp as this attribute. (It'd be slightly delayed because it would go through filter transform, but I think it's not a big deal.)IngestIntoEETransform
transform, add the timestamp as this attribute.Benefit of these attributes is that, we can track down how much time the pipeline took to process a single file. It is an important stat especially for a real-time pipeline.
To the one who will use this metrics in the future...
ic = ee.ImageCollection(IMAGE_COLLECTION)
job_start_array = ic.aggregate_array('job_start_time').getInfo() ingestion_time_array = ic.aggregate_array('ingestion_time').getInfo() init_time_array = [t/1000. for t in hrrr.aggregate_array('system:time_start').getInfo()] valid_time_array = [t/1000. for t in hrrr.aggregate_array('system:time_end').getInfo()]