Open riverma opened 3 months ago
@philipjyoon - did I capture the logic correctly? The above would apply for FWD or HIST regardless, assuming enough time has passed.
@riverma There are a few more dimensions to this:
burst_to_frame.json
and frame_to_burst.json
we should use opera-disp-s1-consistent-burst-ids-with-datetimes.json
which contains the real burst pattern information. This is the file OPERA PCM uses - OPERA PCM does not use the former two files mentioned.We discussed one more dimension which we hadn't decided whether it was worth the complexity: verifying the K- and M- files used as input files in producing the DISP-S1 products. There are two ways to look at those K- and M- input files:
(to be continued... I'll write out what I think should be the overall logic tomorrow morning)
Sample logic:
opera-disp-s1-consistent-burst-ids-with-datetimes.json
determine whether the number of bursts find matches the number is required. If so, a corresponding DISP-S1 product should have been created.OPERA_L3_DISP-S1_IW_F03050_VV_20240709T000000Z_20240814T000000Z_v0.3_20240815T133432Z
This is documented here: https://github.com/nasa/opera-sds-pcm/blob/develop/conf/pge_outputs.yaml#L152 The two important fields are the Frame ID and the "sec_time" which I believe is the Sensing Time or the Acquisition Time.OPERA_L3_DISP-S1_IW_F03050_VV_20240709T000000Z*
Use that pattern to query CMR to find that product.
Comparison Options:
Some key resources needed:
@philipjyoon - thank you so much for writing out these excellent and clear points! Extremely helpful.
I have a few follow-up questions:
We can then concoct a native id pattern to find the corresponding DISP-S1 product from CMR. I will be something like OPERA_L3_DISP-S1_IW_F03050_VV_20240709T000000Z* Use that pattern to query CMR to find that product. A tricky part: Note that the acquisition time used here only has a day precision - the time has been stripped away. Each CSLC burst is acquired within tens of seconds of each other so it's possible that some may cross the day boundary. Therefore, if we don't find a DISP-S1 product using the exact day, we should also search +- one day. This is rare but possible.
Hmm, can't we just use the same strategy we did for DSWx-S1? Namely:
Above logic only checks whether the right DISP-S1 has been produced but does not check whether it was produced using all the correct CSLC input files. To perform the latter, we need to obtain the full metadata. This is not available from CMR to my knowledge. We can obtain in two ways:
The logic I mentioned in the above quote would tell us exactly which CSLCs we should have used. Am I missing something? How would we not know this?
This function can be used to determine those two: https://github.com/nasa/opera-sds-pcm/blob/develop/data_subscriber/cslc_utils.py#L312
Do you have a recommendation on how to import your code? I'm assuming we don't have published packages. Currently the auditing tools are within /report
@riverma I did not realize that CMR query also returns InputGranules
If that's the case, yes, what you've outlined would work.
You can use the code here as the general guideline to use cslc_util.py
https://github.com/nasa/opera-sds-pcm/blob/develop/tests/data_subscriber/test_cslc_util.py
You can import it by from data_subscriber import cslc_utils
on a deployed system that would already have data_subscriber
package installed. If you wish to install this package independently of deploying a cluster, we'd have to do a little bit of research (I think it's possible)
Next steps based on discussions:
opera-disp-s1-consistent-burst-ids-with-datetimes.json
rather than opera-s1-disp-0.5.0-frame-to-burst.json
since the former is a subset of the latter (just the CSLC products we need to match)def validate_disp_s1(smallest_date, greatest_date, endpoint, df):
needs to be updated to reflect testing once ASF.DAAC has successfully ingested DISP-S1 products to UAT. That way we can test the tool in real-world conditions. This tool must also take into account blackout dates
We've been told by ASF that the input CSLC granule list CANNOT be stored in CMR because it would break. So we will need to implement this logic without that information - as @philipjyoon described in a comment on Aug 15, 2024
Checked for duplicates
Yes - I've already checked
Alternatives considered
Yes - and alternatives don't suffice
Related problems
The DSWx-S1 validator tool currently only supports DSWx-S1. We'd like to ensure DISP-S1 is also supported.
Describe the feature request
Sample logic:
burst_to_frame.json
file to locate the frame IDs that correspond to this CSLC product.frame_to_burst.json
file to identify all the CSLC burst IDs expected for this frame.Some key resources needed:
frame_to_burst.json
orburst_to_frame.json