Closed ddj116 closed 6 months ago
OK so the core tests are fixed, but I'm still getting that "artifact upload" error which I don't understand:
Here's how the artifacts are defined for the job:
- uses: actions/upload-artifact@master
if: ${{ always() }}
with:
name: doctests
path: |
share/trick/trickops/tests/*_doctest_log.txt
/tmp/log.*
I found this thread which seems to imply that all artifacts are collected into a single archive, even across jobs...? Maybe that's only true because of the "build matrix" y'all have configured...? If that's the case then it would make sense what's happening, whichever TrickOps job finishes first (Rocky 8 vs. Ubuntu) "wins" and the last job gets the error. Both jobs create log files of the same name in completely different workspaces.
@hchen99 @sharmeye what should we do here? Some folks in that thread have suggested rolling back to v3
which looks like ya'll are already using in your other tests, so maybe you've already run into this...
@ddj116 may want to try to change actions/upload-artifact@master
to actions/upload-artifact@v3.0.0
in trickops.yml as you suggested. all other test yml files have actions/upload-artifact@v3.0.0
OK reverted upload-artifacts mechanism to v3.0 and getting all green checks. I know @hchen99 mentioned that TrickOps unit tests were removed as part of #1400, but I don't quite understand that decision since the unit tests only take 4 minutes to complete and the longest other job is 14 minutes. This PR restores it to being tested in every PR which I would prefer, I'm happy to discuss the details with y'all if you have questions before we merge.
Looks to me that TrickOps unit tests were changed from pull_request triggered to scheduled (unit tests were not removed :) to cut down time that Trick devs are spending on waiting for tests. Maybe TrickOps unit tests probably seldom fail although it's doesn't take that long but could be sufficient just run once a while and also the workflow can be manually triggered at anytime...and others on pull_request tests are much needed ... just trying to reason where the decision came from and personally find it was reasonable :) Meanwhile, we understand your point and need. We'll discuss and will let you know if we come to a different idea, otherwise, we'll get this PR merged.
@ddj116 @jmpenn commented out two of those sims, SIM_amoeba and SIM_test_varserv, in the yaml file. Was there a reason you wanted them uncommented?
@jmpenn commented out two of those sims, SIM_amoeba and SIM_test_varserv, in the yaml file. Was there a reason you wanted them uncommented?
Yes - The unit test suite is built around the content in share/trick/trickops/tests/*.yml
. share/trick/trickops/tests/trick_sims.yml
(the file @jmpenn commented those sims out in) is just one of 7 yaml
files and this one represents a list of SIM
s that ship with trick. The other 7 yaml
files are lists of sims containing errors of some kind.
When the TrickOps unit tests run, they parse these yaml
files and assert certain criteria like "this yaml file when parsed produces 56 build jobs and 36 run jobs". When those sims were removed from the file, the tests started failing because the number of parsed sims went down which resulted in bad success criteria for a bunch of assertions.
To be clear, @jmpenn's removal of sims in share/trick/trickops/tests/trick_sims.yml
did not prevent SIM_amoeba
or SIM_test_varserv
from being built and run because TrickOps unit tests never build or run any sim, they only do error checking on the framework which largely consists of parsing the yaml file and checking to make sure it received the content the tests expect. @jmpenn made these two commits in a row:
commit a22829c1ed887c212950934c6ef3842b57176e1c
Author: jmpenn <john.m.penn@nasa.gov>
Date: Wed Sep 27 14:31:16 2023 -0500
Update trick_sims.yml
share/trick/trickops/tests/trick_sims.yml | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
commit f120264bd81834da7f1306a6a46b00807c850358
Author: jmpenn <john.m.penn@nasa.gov>
Date: Wed Sep 27 12:54:25 2023 -0500
Update test_sims.yml
We can't have SIM_amoeba and SIM_varserv in our regular CI test suite if they are going to periodically fail due to environmental variations over which we have no control.
test_sims.yml | 51 ++++++++++++++++++++++++++++-----------------------
1 file changed, 28 insertions(+), 23 deletions(-)
f12026 actually disabled the sims in Trick's core make test
architecture. a22829 didn't disable anything, it just broke the unit tests that depend on share/trick/trickops/tests/trick_sims.yml
. My guess is back in Sept 2023 he simply git grep
d for the offending SIMs, and commented out any yaml
file it was listed in.
@ddj116 Oh, that's good to know. Thanks for the info. share/trick/trickops/tests/trick_sims.yml
simply just lists all trick sims for your unit tests. Your unit test checks hom many sim builds/runs... so it'd see 54 but expecting 56 if those 2 removed from this trick_sims.yml. However, wondering why in ut_TrickWorkflow.py, get_jobs('build') would return 56 instead of 54 as trick test doesn't build those 2 sims that are commented out in test_sims.yml?
wondering why in ut_TrickWorkflow.py, get_jobs('build') would return 56 instead of 54 as trick test doesn't build those 2 sims that are commented out in test_sims.yml?
Here's a snippet we can talk to from ut_TrickWorkflow.py
:
def test_get_jobs_nominal(self):
# Test all the kinds permutations
builds = self.instance.get_jobs('build')
self.assertEqual(len(builds), 56)
builds = self.instance.get_jobs('builds')
self.assertEqual(len(builds), 56)
runs = self.instance.get_jobs('run')
self.assertEqual(len(runs), 36)
runs = self.instance.get_jobs('runs')
self.assertEqual(len(runs), 36)
vg = self.instance.get_jobs('valgrind')
self.assertEqual(len(vg), 1)
vg = self.instance.get_jobs('valgrinds')
self.assertEqual(len(vg), 1)
a = self.instance.get_jobs('analysis')
self.assertEqual(len(a), 8)
a = self.instance.get_jobs('analyses')
self.assertEqual(len(a), 8)
a = self.instance.get_jobs('analyze')
self.assertEqual(len(a), 8)
This section asserts the criteria of the tests - it's expecting 56 "sim build" jobs to come out of parsing share/trick/trickops/tests/trick_sims.yaml
. The success criteria 56
is hardcoded in the test because that's what share/trick/trickops/tests/trick_sims.yml
produces (at least, before @jmpenn's changes). With two sims commented out, get_jobs('build')
will return 54 so the test fails. It's not as easy as just updating 56 to 54, there are other attributes tested all over the file which will change anytime a sim is added or removed to share/trick/trickops/tests/trick_sims.yaml
. That's why we store our own copy under the TrickOps test area, so that as y'all add/remove sims to trick/test_sims.yml
the TrickOps unit tests are unaffected. Let me know if you still have questions or issues with the approach.
Oh, ok, ut_TrickWorkflow.py
->get_jobs('build')
is based on path:
of each SIM section in share/trick/trickops/tests/trick_sims.yml
and is not based on what was actually built by trick make test
so you can have your assert properly without the possibility of being impacted by changed tests on trick side. It makes sense now. Just I assumed the latter was the case when I saw get_jobs('build')
.
The other thing was that you mentioned that TrickOps unit tests never build or run any sim and I was bit confused by that when I saw trick-CP
as build command in both TrickWorkflow.py
and ut_TrickWorkflow.py
. But then realized you just use build command for your asserts not actually run the build command.
trick_sims.yml was "accidentally changed" to remove unstable sims in Sept 2023, breaking the TrickOps unit tests. This reverts that change.