nasa / opera-sds-pcm

Observational Products for End-Users from Remote Sensing Analysis (OPERA)
Apache License 2.0
16 stars 12 forks source link

[New Feature]: DISP-S1 Historical Processing operator-friendly features #997

Open philipjyoon opened 3 days ago

philipjyoon commented 3 days ago

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

DISP-S1 historical processing is orders of magnitude more complex than the previous CSLC historical processing. To process the entire series is like running 1400 independent historical processing series. Currently we have internal state of each frame as number of sensing datetimes processed but that's hard to understand and also does not show the overall progress of the entire historical processing batch_proc.

Here are the recommended features:

  1. Store a single percentage number as the overall progress of that historical processing batch_proc. This is basically the ratio between total number of possible historical processing SCIFLO jobs and how many have succeeded so far.
  2. We may also want to track percentage progress on individual frame level. We would create another dictionary for this.
  3. We may also want to track the last date processed on a per frame basis. We currently track just the number of sensing datetimes track because this is what historical processing app needs internally. But that information is hard to make sense out of to the user.
sjlewis-jpl commented 3 days ago

For UI purposes, percentages are useful. For internal logic (and maybe you're already there), I think tracking the ratio of number processed to total number (as two integers) for each Frame would be more useful. That allows easy computation of a percentage, and makes it easy to add or average Frames together, with proper weighting.

In any case, making an easy way to track the progress per frame is a great idea.

philipjyoon commented 1 day ago

@sjlewis-jpl This is what it currently looks like. Let me know what you think.

This is what you will see when you run python tools/pcm_batch.py view

frame_states is not new; it's what the historical processing uses to track progress internally. It shows the number of sensing datetimes that's been submitted so far on per-frame basis.

frame_states                    {'8882': 105, '831': 105, '832': 105, '833': 105}
frame_completion_percentages    ['8882: 48%', '831: 51%', '832: 47%', '833: 47%']
last_processed_datetimes        {'8882': '2020-07-08T00:27:08', '831': '2020-09-27T23:06:17', '832': '2020-04-12T23:06:30', '833': '2020-04-12T23:06:52'}
progress_percentage             48%

To give some context, these were the historical processing batch_proc parameters:

data_start_date                 2016-07-01T00:00:00
data_end_date                   2024-07-01T00:00:00
k                               15
frames                          [[831, 833], 8882]

And this is what the progress looks like for these particular frames and data date range at the end without having implemented the end-of-frame-series behavior that will process the k-remainders

frame_states                    {'8882': 210, '831': 195, '832': 210, '833': 210}
frame_completion_percentages    ['8882: 95%', '831: 95%', '832: 95%', '833: 94%']
last_processed_datetimes        {'8882': '2024-01-25T00:27:25', '831': '2024-02-03T23:06:30', '832': '2024-01-22T23:06:52', '833': '2024-01-10T23:07:15'}
progress_percentage             95%