spacetelescope / webbpsf

James Webb Space Telescope PSF simulation tool
https://webbpsf.readthedocs.io
BSD 3-Clause "New" or "Revised" License
115 stars 61 forks source link

add dedicated cache workflow that can be called by other projects #877

Closed zacharyburnett closed 1 month ago

zacharyburnett commented 2 months ago

since this data is also used by the Romancal and Romanisim CI workflows, it makes sense to make a reusable workflow here that can be updated centrally (in case the URL changes or something), instead of duplicating the code there

~the workflow cache.yml downloads the latest data file from the Box link and saves it to a cache named webbpsf-data-1.3.0~ see https://github.com/spacetelescope/webbpsf/pull/877#issuecomment-2220851440 for updated explanation:

to summarize, I've split the workflow functionality currently set up in romancal and romanisim into the download_data.yml and retrieve_cache.yml workflows:

  • download_data.yml downloads the dataset and caches it to GitHub cache (with the key webbpsf-data-mini-1.3.0, webbpsf-data-full-1.2.6, etc.), making a new cache when the downloaded data has a new version. This should only be run on a schedule or manually when the data has a new version. Unfortunately, since caches are per-repository other repositories will need to make an additional workflow that solely calls this workflow, and then trigger it on a schedule or manually, in order to set up their own cache of the data.
  • retrieve_cache.yml uses the gh CLI to retrieve the key of the latest data cache made in the client repository, which can then be downloaded (restored) in the client workflow. This is the one that should be called by CI workflows, either here or in other repostories

I split these to control how much the data is actually downloaded, since caches are much faster than Box. This isn't as much of an issue now that the WebbPSF dataset isn't really all that big, but I figure we could also copy / paste this workflow to other datasets if they are used in the future

mperrin commented 2 months ago

FYI @BradleySappington you might also take a look at this too, please.

zacharyburnett commented 2 months ago

API design question: Would download_data.yml or install_data.yml perhaps be a more obvious name versus cache.yml? Your choice; either's fine with me, so I'm just raising the question.

You're right, that's probably better, especially if other repos call it (it would be something like uses: spacetelescope/webbpsf/.github/workflows/download_data.yml)

zacharyburnett commented 1 month ago

looks good to me Zach pending change Marshall pointed out, thanks for setting this up!

no problem! I changed the name to download_data.yml

zacharyburnett commented 1 month ago

ok, I've changed the workflow pointer back to use the develop branch (it won't work now until it's merged)

zacharyburnett commented 1 month ago

to summarize, I've split the workflow functionality currently set up in romancal and romanisim into the download_data.yml and retrieve_cache.yml workflows:

I split these to control how much the data is actually downloaded, since caches are much faster than Box. This isn't as much of an issue now that the WebbPSF dataset isn't really all that big, but I figure we could also copy / paste this workflow to other datasets if they are used in the future

mperrin commented 1 month ago

I see there is now both a download_data and a retrieve_cache workflow. Can you explain please what the distinction is and what the intended use case is for each? It sounds like you have some specifics in mind for use in the various Roman tools but it’s not quite clear to me how the two scripts relate, etc.

mperrin commented 1 month ago

Aha never mind, I wrote my prior comment exactly at the same time as you were already answering the question :-)

mperrin commented 1 month ago

This PR looks good to me, but I'm trying to figure out why the CI tests haven't run. @BradleySappington maybe you might have some insights into that?

zacharyburnett commented 1 month ago

This PR looks good to me, but I'm trying to figure out why the CI tests haven't run. @BradleySappington maybe you might have some insights into that?

they won't be able to run again until this PR is merged, since they point to the develop branch and this workflow isn't yet on that branch: Screenshot 2024-07-16 at 10 16 48 AM

We did run the workflow earlier by making a test commit to point the workflow to this specific branch, and then removed that commit

zacharyburnett commented 1 month ago

Very minor suggestion, which we can leave for a later future PR: it would be nice to have some comments in the YML files, or if that's not possible a readme/notes file in the workflows directory that explains what each script is for, so others later can understand how how to use these without having to hunt down the discussion in this PR. But let's not let that hold up merging this in so it can run.

how's this: 3796c0aaae9c0e7c8f33d7de08fc793109558663

mperrin commented 1 month ago

That's superb @zacharyburnett. I was trying to save you work today but you're too fast. :-)

@BradleySappington just to check, I assume you concur this is good to merge?

BradleySappington commented 1 month ago

merging with bypassed protections due to comment above

This PR looks good to me, but I'm trying to figure out why the CI tests haven't run. @BradleySappington maybe you might have some insights into that?

they won't be able to run again until this PR is merged, since they point to the develop branch and this workflow isn't yet on that branch: Screenshot 2024-07-16 at 10 16 48 AM

We did run the workflow earlier by making a test commit to point the workflow to this specific branch, and then removed that commit

zacharyburnett commented 1 month ago

I apologize; the download_data.yml workflow isn't showing a Run Workflow button so I can't run it manually, even though it has a workflow_dispatch. I'll try and diagnose this issue, but in the meantime the CI won't be able to retrieve a cache because there is no cache...

zacharyburnett commented 1 month ago

Oh wait, I won't see a Run Workflow button if I don't have write access to the repository 🤦‍♂️

Could someone with write access please try running the workflow here: https://github.com/spacetelescope/webbpsf/actions/workflows/download_data.yml

mperrin commented 1 month ago

Trying that now.

zacharyburnett commented 1 month ago

Trying that now.

Thanks! Sorry for the syntax error, I neglected to realize that we never tested download_data.yml with the new inline code 😅

I made a PR here: https://github.com/spacetelescope/webbpsf/pull/882