icesat2py / icepyx

Python tools for obtaining and working with ICESat-2 data
https://icepyx.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
201 stars 100 forks source link

NSIDC request url generation currently includes duplicate keywords #488

Closed JessicaS11 closed 4 months ago

JessicaS11 commented 6 months ago

In digging into #457, we discovered that order request urls were being submitted with both CMR and order parameters (e.g. Order request URL: https://n5eil02u.ecs.nsidc.org/egi/request?short_name=ATL11&version=006&bounding_box=-87.0,74.7,-85.0,75.4&page_size=2000&page_num=1&request_mode=async&include_meta=Y&client_string=icepyx&time=2018-09-01T00:00:00,2022-01-01T23:59:59&bbox=-87.0,74.7,-85.0,75.4), making it impossible to submit a request without a temporal parameter (the short-term solution to #457). Thus, we'll need to dig in to what's causing both (e.g.) "bounding_box" and "bbox" to be submitted and fix it.

JessicaS11 commented 4 months ago

In trying to address this issue, which I was thinking was a "bug", I discovered that submitting both CMR and EGI keywords (for spatial and temporal) seems to enable the subsetter to get a shorter list of granules to try and process (i.e., use CMR to narrow the granule search before actually trying to subset them for the order). Specifically (with primary differences in the submitted request url and in the returned "Instructions"):

Case both CMR and EGI keywords are submitted

https://n5eil02u.ecs.nsidc.org/egi/request?short_name=ATL06&version=006&temporal=2019-02-20T00:00:00Z,2019-02-28T23:59:59Z&bounding_box=-55.0,68.0,-48.0,71.0&page_size=2000&page_num=1&request_mode=async&include_meta=Y&client_string=icepyx&time=2019-02-20T00:00:00,2019-02-28T23:59:59&bbox=-55.0,68.0,-48.0,71.0
Order request response XML content:  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<eesi:agentResponse xmlns="" xmlns:iesi="http://eosdis.nasa.gov/esi/rsp/i" xmlns:ssw="http://newsroom.gsfc.nasa.gov/esi/rsp/ssw" xmlns:eesi="http://eosdis.nasa.gov/esi/rsp/e" xmlns:esi="http://eosdis.nasa.gov/esi/rsp" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <order>
        <orderId>5000005341041</orderId>
        <Instructions>You may receive an email about your order if you specified an EMAIL address. &lt;br/&gt;&lt;br/&gt;The instructions used to process this order are:  Include metadata and processing history=Y. Bounding Box(es)=-55.0,68.0,-48.0,71.0. Granule id(s)=SC:ATL06.006:265586169,SC:ATL06.006:265586244,SC:ATL06.006:265585170,SC:ATL06.006:265611966. Temporal search start=2019-02-20T00:00:00 end=2019-02-28T23:59:59. Processing tool=ICESAT2.&lt;br/&gt;&lt;br/&gt;Note from Client: icepyx</Instructions>
    </order>
    <contactInformation>
        <contactName>NSIDC User Services</contactName>
        <contactEmail>nsidc@nsidc.org</contactEmail>
    </contactInformation>
    <processInfo>
        <processDuration>P0Y0M0DT0H0M0.020S</processDuration>
        <subagentId>ICESAT2</subagentId>
    </processInfo>
    <requestStatus>
        <status>processing</status>
        <numberProcessed>0</numberProcessed>
        <totalNumber>4</totalNumber>
    </requestStatus>
</eesi:agentResponse>

Case only EGI keywords are submitted (with edits to the IDs and "no data found" for brevity)

 https://n5eil02u.ecs.nsidc.org/egi/request?short_name=ATL06&version=006&page_size=2000&page_num=1&request_mode=async&include_meta=Y&client_string=icepyx&time=2019-02-20T00:00:00,2019-02-28T23:59:59&bbox=-55.0,68.0,-48.0,71.0
Order request response XML content:  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<eesi:agentResponse xmlns="" xmlns:iesi="http://eosdis.nasa.gov/esi/rsp/i" xmlns:ssw="http://newsroom.gsfc.nasa.gov/esi/rsp/ssw" xmlns:eesi="http://eosdis.nasa.gov/esi/rsp/e" xmlns:esi="http://eosdis.nasa.gov/esi/rsp" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <order>
        <orderId>5000005341022</orderId>
        <Instructions>You may receive an email about your order if you specified an EMAIL address. &lt;br/&gt;&lt;br/&gt;The instructions used to process this order are:  Include metadata and processing history=Y. Bounding Box(es)=-55.0,68.0,-48.0,71.0. Granule id(semporal search start=2019-02-20T00:00:00 end=2019-02-28T23:59:59. Processing tool=ICESAT2.&lt;br/&gt;&lt;br/&gt;Note from Client: icepyx</Instructions>
    </order>
    <contactInformation>
        <contactName>NSIDC User Services</contactName>
        <contactEmail>nsidc@nsidc.org</contactEmail>
    </contactInformation>
    <processInfo>
        <processDuration>P0Y0M0DT0H0M3.716S</processDuration>
        <subagentId>ICESAT2</subagentId>
        <info>Granule 265582908 contained no data within the spatial and/or temporal subset constraints to be processed</info>
*(additional lines removed for presentation)*
        <info>Granule 265583660 contained no data within the spatial and/or temporal subset constraints to be processed</info>
    </processInfo>
    <requestStatus>
        <status>processing</status>
        <numberProcessed>12</numberProcessed>
        <totalNumber>1382</totalNumber>
    </requestStatus>
</eesi:agentResponse>

@asteiker @betolink @andypbarrett @wallinb @mikala-nsidc Can someone confirm that this is expected/intended behavior and still the recommended best approach (and I'm just forgetting that's why we set it up this way)? It seems like it would have important implications for granules where the metadata doesn't match the actual data and order processing speed, since it takes an order from 4 to 1382 granules to process...

asteiker commented 4 months ago

@JessicaS11 The reasoning behind ensuring that both the CMR + EGI keywords are applied is exactly because of that extraneous processing time. Without any CMR parameters, it's going to send every single granule (within the page size) to the subsetter and you'll get all of those "no data found" errors. You are right, however, that this assumes the CMR metadata is accurate, though I believe there have been improvements to the SIPS-generated granule metadata in the latest versioning event to mitigate this problem. I think this should still be the recommended approach to avoid all of those extraneous granules getting sent to the subsetter.

JessicaS11 commented 4 months ago

Thanks @asteiker! That makes sense - I won't remove the duplicate keywords then.

mikala-nsidc commented 4 months ago

We do have improved metadata solutions (for the granule representations in CMR) in place for v006 of ATL03, ATL06, and ATL08, which are the current recommended versions for those data products. And they are actively working on an improvement for ATL13 granule representations, which will be very welcome. M

From: Amy Steiker @.> Reply-To: icesat2py/icepyx @.> Date: Tuesday, February 27, 2024 at 11:43 AM To: icesat2py/icepyx @.> Cc: Mikala Beig @.>, Mention @.***> Subject: Re: [icesat2py/icepyx] NSIDC request url generation currently includes duplicate keywords (Issue #488)

@JessicaS11https://github.com/JessicaS11 The reasoning behind ensuring that both the CMR + EGI keywords are applied is exactly because of that extraneous processing time. Without any CMR parameters, it's going to send every single granule (within the page size) to the subsetter and you'll get all of those "no data found" errors. You are right, however, that this assumes the CMR metadata is accurate, though I believe there have been improvements to the SIPS-generated granule metadata in the latest versioning event to mitigate this problem. I think this should still be the recommended approach to avoid all of those extraneous granules getting sent to the subsetter.

— Reply to this email directly, view it on GitHubhttps://github.com/icesat2py/icepyx/issues/488#issuecomment-1967376265, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT4DK4GMZXYS4V26QT6B6QLYVYSOBAVCNFSM6AAAAABBNT36CKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRXGM3TMMRWGU. You are receiving this because you were mentioned.Message ID: @.***>