Closed sjlewis-jpl closed 5 months ago
This issue isn't unique to DISP-S1 processing - this code is called for all CMR queries which apply for all data types.
@sjlewis-jpl Please let me know which of the following two behaviors are desired or both equally suffice:
Both are equally trivial to implement.
FYI it seems cloudwatch has a line log limit of 250k; each granule output string is about 100 chars. So we can fit maximum of 2500 granules per line (This would be dynamically determined in code) if option 2 above is preferred.
I definitely prefer option 2.
@sjlewis-jpl how do these look? These are now grouped by a maximum line - for CSLC query these are dynamically determined to be 2450 max per line. In the second pic, you can see that the end is not being clipped because the ]
marks the end.
This was tested by submitting a CSLC query job using Tosca with the parameters:
--start-date=2023-12-15T08:17:50Z --chunk-size=2 --k=2 --m=1 --job-queue=opera-job_worker-cslc_data_download --processing-mode=forward --end-date=2024-01-15T08:35:59Z
I like it! You kept the "QUERY RESULTS" for grep-ability, and I really appreciate seeing "X to Y out of Z" in there too.
One question on the log entries: are the last and first entries in successive lines repeated? The index summary at the beginning suggests it might, and in the unfolded example in the second pic, it looks like that last granule might be the first granule in the following line (query result #2450). I think having each input granule listed out once would be most useful, and have the indices match. So, 0 to 2449, then 2450 to 4899, etc.
Good catch. Fortunately the underlying output is correct. The error is just in that xx out of xx message because I naively used the for-loop parameters of start and stop (and start is inclusive whereas stop is exclusive) I'll fix this ASAP.
The text output has been fixed. It should have been printing out 1-based instead of 0-based. So it will now read: 1-2450 2451-4900 etc
Checked for duplicates
No - I haven't checked
Describe the bug
When I went to Cloud Watch to find the lists of CSLC-S1 granule IDs resulting from a query, I noticed that they had been carelessly split between lines. This will make it very difficult to extract those results into any useful format. See the attached screenshot for an example. Given the sheer number of granules (62,462 in this example), it seems like a character limit is being reached in Cloud Watch.
What did you expect?
I expected to be able to easily get the CSLC-S1 granule IDs from the log file, for use in verifying the triggering logic. If a line is sufficiently long for Cloud Watch, I would expect it to be broken up into multiple lines intelligently.
Reproducible steps
Environment