podaac / data-subscriber

Subscribe and bulk download collections of data at PO.DAAC
Apache License 2.0
83 stars 29 forks source link

provide support for granule wildcard patterns in data downloader #138

Closed jjmcnelis closed 1 year ago

jjmcnelis commented 1 year ago

Shailen expressed a need for this capability in SWOTCalVal (indirectly, this seems like the most straightforward way to support selective downloading w/o a dedicated UMM field to query on).

I added 2 lines to the subscriber/podaac_data_downloader.py script to allow for CMR wildcard functionality to be supported through the existing options. This adds the wildcard pattern option to request parameters whenever the user gives a granuleur containing '*' or '?':

        #jmcnelis, 2023/06/14 - provide for wildcards in granuleur-based search
        if '*' or '?' in cmr_granule:
            params.append(('options[GranuleUR][pattern]', 'true'))

This supports Shailen's use case where he wants to selectively download granules by campaign (SWOTCalVal). Here are the invocations for two example cases --

$ python subscriber/podaac_data_downloader.py -c SWOTCalVal_GNSS_L2_1.0 -gr 'SWOTCalVal_??_GNSS_L2_*' -d ./data/
[2023-06-14 14:12:12,727] {podaac_data_downloader.py:270} INFO - Found 2 total files to download
[2023-06-14 14:12:19,628] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:12:19.628547 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_T2_GNSS_L2_Rec11_20230201T221500_20230201T232230_20230227T220903.nc
[2023-06-14 14:12:21,952] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:12:21.952641 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_WM_GNSS_L2_Rec2_20220729T222100_20220730T023300_20230227T211845.nc
[2023-06-14 14:12:21,952] {podaac_data_downloader.py:324} INFO - Downloaded Files: 2
[2023-06-14 14:12:21,952] {podaac_data_downloader.py:325} INFO - Failed Files:     0
[2023-06-14 14:12:21,952] {podaac_data_downloader.py:326} INFO - Skipped Files:    0
[2023-06-14 14:12:22,329] {podaac_data_downloader.py:334} INFO - END
$ python subscriber/podaac_data_downloader.py -c SWOTCalVal_GNSS_L2_1.0 -gr 'SWOTCalVal_WM_GNSS_L2_*' -d ./data/
[2023-06-14 14:12:29,910] {podaac_data_downloader.py:270} INFO - Found 1 total files to download
[2023-06-14 14:12:35,532] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:12:35.532384 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_WM_GNSS_L2_Rec2_20220729T222100_20220730T023300_20230227T211845.nc
[2023-06-14 14:12:35,532] {podaac_data_downloader.py:324} INFO - Downloaded Files: 1
[2023-06-14 14:12:35,532] {podaac_data_downloader.py:325} INFO - Failed Files:     0
[2023-06-14 14:12:35,532] {podaac_data_downloader.py:326} INFO - Skipped Files:    0
[2023-06-14 14:12:35,845] {podaac_data_downloader.py:334} INFO - END

This needs further testing by someone besides me.

jjmcnelis commented 1 year ago

Here's some evidence that these updates still have the expected outcome after fixes caught by @skorper:

(base) jmcnelis@MT-209219:main  [ ~/subscriber-feature-enhancements/data-subscriber ] 
 02:12:35 $ python subscriber/podaac_data_downloader.py -c SWOTCalVal_GNSS_L2_1.0 -gr 'SWOTCalVal_??_GNSS_L2_*' -d ./data/
[2023-06-14 14:50:16,780] {podaac_data_downloader.py:270} INFO - Found 2 total files to download
[2023-06-14 14:50:23,706] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:50:23.706562 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_T2_GNSS_L2_Rec11_20230201T221500_20230201T232230_20230227T220903.nc
[2023-06-14 14:50:25,960] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:50:25.960080 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_WM_GNSS_L2_Rec2_20220729T222100_20220730T023300_20230227T211845.nc
[2023-06-14 14:50:25,960] {podaac_data_downloader.py:324} INFO - Downloaded Files: 2
[2023-06-14 14:50:25,960] {podaac_data_downloader.py:325} INFO - Failed Files:     0
[2023-06-14 14:50:25,960] {podaac_data_downloader.py:326} INFO - Skipped Files:    0
[2023-06-14 14:50:26,310] {podaac_data_downloader.py:334} INFO - END

(base) jmcnelis@MT-209219:main  [ ~/subscriber-feature-enhancements/data-subscriber ] 
 02:50:26 $ python subscriber/podaac_data_downloader.py -c SWOTCalVal_GNSS_L2_1.0 -gr 'SWOTCalVal_WM_GNSS_L2_*' -d ./data
/
[2023-06-14 14:50:37,917] {podaac_data_downloader.py:270} INFO - Found 1 total files to download
[2023-06-14 14:50:43,467] {podaac_data_downloader.py:313} INFO - 2023-06-14 14:50:43.467900 SUCCESS: https://archive.swot.podaac.earthdata.nasa.gov/podaac-swot-ops-cumulus-protected/SWOTCalVal_GNSS_L2_1.0/SWOTCalVal_WM_GNSS_L2_Rec2_20220729T222100_20220730T023300_20230227T211845.nc
[2023-06-14 14:50:43,468] {podaac_data_downloader.py:324} INFO - Downloaded Files: 1
[2023-06-14 14:50:43,468] {podaac_data_downloader.py:325} INFO - Failed Files:     0
[2023-06-14 14:50:43,468] {podaac_data_downloader.py:326} INFO - Skipped Files:    0
[2023-06-14 14:50:43,806] {podaac_data_downloader.py:334} INFO - END
skorper commented 1 year ago

@jjmcnelis A few more things..

jjmcnelis commented 1 year ago

@jjmcnelis A few more things..

* Can you please change this PR to point to `develop` instead of `main`? new features go into `develop`, then are eventually merged to main when we release

* Can you add a line in the changelog? Create a new [unreleased] section

* Can you add something in the downloader readme about this new feature?

Thanks for your patience, @skorper. I'm out of my element..

I made edits to each of CHANGELOG.md, Downloader.md, and to the help text for the -gr option inside podaac_data_downloader.py to expand on its use with wildcard patterns. The Downloader.md links CMR Search API docs describing this wildcard search feature, which functions in exactly the same way thru our tool as it does for the REST API parameters (many more of which are supported than just Granule UR, but this one is the most useful to expose to users thru downloader tool IMO). Let me know if these updates don't meet our standards and I'll take another shot at it right away, thanks again

skorper commented 1 year ago

Thank you @jjmcnelis ! The last thing would be to resolve merge conflicts, then I will approve 🙂