olcf / olcf-test-harness

OLCF Test Harness
https://olcf.github.io/olcf-test-harness/
Other
12 stars 4 forks source link

influx_log gives vague and unhelpful output #153

Closed hagertnl closed 2 months ago

hagertnl commented 11 months ago

Running under --mode influx_log gives very vague output, like the following:

Using machine config: frontier.ini
Using machine config: frontier.ini
/lustre/orion/stf016/proj-shared/hagertnl/applications
Overriding tasks in inputfile since CLI mode was provided
runmodecmd =  ['influx_log']
self.__harness_task:  [['influx_log', None, None]]
reading harness config /sw/acceptance/olcf-test-harness/configs/frontier.ini
RGT_PATH_TO_SSPACE is already set. Skipping.
RGT_SYSTEM_LOG_TAG is already set. Skipping.
Starting tasks for Application.Test: coral2-lammps.test_0001node_17mil_reax: [['influx_log', None, None]]
Skipped 0, launched 1.

This is pretty unhelpful. More messages about what's going on by default would be appreciated.

hagertnl commented 10 months ago

Removing from v3 target fixes -- this bug requires refactorization and proper error message returns from methods. For example, in apptest.py, the return value of logging_status_file.post_event_to_influx is not stored, so we have no validation on if the event posted successfully or not.

I think the solution to this bug will be to report certain metrics such as number of successfully logged events/metrics, failed to log events/metrics, skipped for incorrect machine name, or skipped/already-logged. Need to correct this return value issue before we can add proper metrics.

hagertnl commented 3 months ago

Fixed in #177 , as part of update_databases.py