Open AstroBarker opened 11 months ago
I strongly prefer not switching to the release based gold model used in Parthenon. It requires manually syncing PRs with gold releases which I find awkward and laborious to set up for the developer, and to review for the reviewer. I believe golds should be in plaintext and included in PRs as they are now. The compression option we have in Phoebus where only a sparse sampling of the gold file can be included in the repository for large files was supposed to make this workable long-term. Is there a looming problem with what we have now in Phoebus?
Best
Ben
On Sat, Dec 2, 2023, 9:33 AM Brandon Barker @.***> wrote:
As the title says. I am starting to spin up implementing the Parthenon regression testing framework https://parthenon-hpc-lab.github.io/parthenon/develop/src/tests.html#regression-tests into Phoebus. It's basically done, but we are going to need a long term solution for gold files for accepted solutions. Parthenon bakes them into github releases -- are there size limits on that? i.e., would we run into issues far down the line? Thoughts?
— Reply to this email directly, view it on GitHub https://github.com/lanl/phoebus/issues/187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ27SFZMVJIPMLEQ5TLHI3YHNJ67AVCNFSM6AAAAABAEDV5Y6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDEMJRGQ3TMNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ah I see. I missed the process for creating the gold files -- I like this far more than the release model. Should've looked closer. As for looming problems, none. What are thoughts on wrapping the existing infrastructure into the Parthenon tooling for running the tests, so that they e.g., hook into the cmake testing?
Thanks for the clarifications!
I am concerned about the long-term viability of plaintext gold files but they've worked for us up until now, so I'm fine keeping that machinery in place. I do like the idea of hooking into Parthenon's testing machinery other than that.
For some context, I work on a software project with ~5-10 develoeprs that's had ~100 plaintext goldfiles in the repository for ~5 years that get upgolded regularly and it's never been an issue.
I have a preference (rather than a strong preference) for keeping the test machinery as-is because it works currently and is simple and extensible. Maybe I'm not a power user but I also never found much value in Catch2, I never used it as something other than a wrapper around bool
(test passed or failed), and it has had issues in the past with appropriately handling printf
/std::cout
output, although maybe these are fixed now.
I also don't see any need to give cmake an opinion about how we run tests though I could be missing a useful feature.
One potential issue with the parthenon testing machinery -- does it support plaintext gold files?
The Parthenon testing machinery does support plaintext gold files as it just reads whatever you tell it to in the python script.
IMO catch and our regression tests serve different needs, and we’re already using catch for our unit tests.
Parthenon’s regression test suite is actually very similar to what we wrote forever ago for firelink/MOCMC. (And In fact the design is informed by that).
That said, I don’t feel strongly either way, except that it would be nice to be running Phoebus on more hardware than GitHub provides.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Ben Ryan @.> Sent: Monday, December 4, 2023 8:23:16 AM To: lanl/phoebus @.> Cc: Subscribed @.***> Subject: [EXTERNAL] Re: [lanl/phoebus] Long term solution for gold files (Issue #187)
For some context, I work on a software project with ~5-10 develoeprs that's had ~1000 plaintext goldfiles in the repository for ~10 years that get upgolded regularly and it's never been an issue.
I have a preference (rather than a strong preference) for keeping the test machinery as-is because it works currently and is simple and extensible. Maybe I'm not a power user but I also never found much value in Catch2, I never used it as something other than a wrapper around bool (test passed or failed), and it has had issues in the past with appropriately handling printf/std::cout output, although maybe these are fixed now.
I also don't see any need to give cmake an opinion about how we run tests though I could be missing a useful feature.
One potential issue with the parthenon testing machinery -- does it support plaintext gold files?
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/lanl/phoebus/issues/187*issuecomment-1838873748__;Iw!!Bt8fGhp8LhKGRg!HOXBt-62q9g9qjfovL3Xow5xqV8ViOjgPgqFBrcTlGms5KNry485AKthDuWzm1rva0mVL7AhsDkTMRe2QzYOW2bb$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AOO5OJ4FM5AVOBHPXT5URFLYHXTGJAVCNFSM6AAAAABAEDV5Y6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZYHA3TGNZUHA__;!!Bt8fGhp8LhKGRg!HOXBt-62q9g9qjfovL3Xow5xqV8ViOjgPgqFBrcTlGms5KNry485AKthDuWzm1rva0mVL7AhsDkTMRe2QyTbBrsZ$. You are receiving this because you are subscribed to this thread.Message ID: @.***>
it would be nice to be running Phoebus on more hardware than GitHub provides.
Yeah I strongly agree with this. If parthenon's regression test gets us to EDIT: GPU CI then it would be quite useful for development. Thanks for pointing that out.
Not sure Parthenon's regression suite will get us there... someone just needs to set up some CI on the re-git mirror again. Which probably has to be one of LANL staff...
If you want a place to start on Darwin, I could also use more eyes on this: https://github.com/AFD-Illinois/kharma/blob/kharma-next/scripts/ci/darwin.yml
It's pretty basic, no containers or anything, should work with Jacamar. I use three different downstream pipelines to run on different hardware, and control what runs where with tags.
As the title says. I am starting to spin up implementing the Parthenon regression testing framework into Phoebus. It's basically done, but we are going to need a long term solution for gold files for accepted solutions. Parthenon bakes them into github releases -- are there size limits on that? i.e., would we run into issues far down the line? Thoughts?