hpc / pavilion2

Pavilion is a Python 3 (3.5+) based framework for running and analyzing tests targeting HPC systems.
https://pavilion2.readthedocs.io/
Other
43 stars 20 forks source link

older series are not show #722

Open curtisecombsjr opened 9 months ago

curtisecombsjr commented 9 months ago

Hello! We have several hundred series in .working_dir/series but pavilion seems to only "know about" the last 2 and outputs the incorrect tests for older series when queried with pav status s<sid>


~/.pavilion/working_dir/series> ls

... (truncated) ...

0000105  0000218  0000331  0000444  0000557  0000670  0000784
0000106  0000219  0000332  0000445  0000558  0000671  0000785
0000107  0000220  0000333  0000446  0000559  0000672  0000786
0000108  0000221  0000334  0000447  0000560  0000673  0000787
0000109  0000222  0000335  0000448  0000561  0000674  0000788
0000110  0000223  0000336  0000449  0000562  0000675  next_id
0000111  0000224  0000337  0000450  0000563  0000676  series_info_transform.db
0000112  0000225  0000338  0000451  0000564  0000677
0000113  0000226  0000339  0000452  0000565  0000678
~/.pavilion/working_dir/series> ls -lad 0* | wc -l
787

~/.pavilion/working_dir/series> pav list series
s788 s787

~/.pavilion/working_dir/series> ls 0000675/
0003009  0003010  0003011  0003012  config  dependency  series.out  series.pgid

~/.pavilion/working_dir/series> pav status s675
 Test statuses
---------+-----------------------+----------+-------------+--------------------
 Test id | Name                  | State    | Time        | Note
---------+-----------------------+----------+-------------+--------------------
 3460    | streams-pbs-stg2.base | COMPLETE | 21 19:54:06 | The test completed
         |                       |          |             | with result: PASS
 3461    | hpl-pbs-stg2.base     | COMPLETE | 21 20:38:59 | The test completed
         |                       |          |             | with result: PASS
 3462    | dgemm-pbs-stg2.base   | COMPLETE | 21 21:27:09 | The test completed
         |                       |          |             | with result: PASS
 3463    | hpcg-pbs-stg2.base    | COMPLETE | 21 22:11:01 | The test completed
         |                       |          |             | with result: PASS
 3464    | p2p-pbs-stg2.base     | COMPLETE | 21 22:17:11 | The test completed
         |                       |          |             | with result: PASS
 3465    | gfscpu-pbs-stg2.base  | COMPLETE | 21 22:36:42 | The test completed
         |                       |          |             | with result: PASS
~/.pavilion/working_dir/series>

I am using "pav status" on an older series, 675. The current series is 788, however the testids for s788 are shown for s675.

(notice here, that the Pavilion testids listed in the 0000675 directory are 3009-3012, NOT 3460-3465)

This only started happening today. Thank you so much!

curtisecombsjr commented 9 months ago

Strange, but I enabled "log_level: debug" to see if i could find an error, but that seems to have broken it even more. Now the pav.log does not show any output and the series list is completely empty...


~> pav list series
No matching items found.
~>

Also a we are running 2.3 if that helps.

:~> pav --version
Pavilion 2.3
:~>

Thanks

Paul-Ferrell commented 8 months ago

Sorry for the delay - we've been out the last few weeks for holiday break.

Double check that working directory path Pavilion is using is what you think it is. pav show config will show you the Pavilion config settings as Pavilion sees them. Double check the path as given there.

Paul-Ferrell commented 8 months ago

Also, pav show config_dir will list all the config directories.

curtisecombsjr commented 8 months ago

Paul,

Thanks so much. Everything in the output here seems to be correct. Let me attach it to this comment, so that you can see for yourself, but as far as I can tell, the directories line up and match. New mystery: pav list series now shows no series. Very strange. pav.txt

Paul-Ferrell commented 8 months ago

Are you using the latest master?

Could you attach you pavilion.yaml?


From: curtisecombsjr @.> Sent: Tuesday, January 2, 2024 12:38 PM To: hpc/pavilion2 @.> Cc: Ferrell, Paul Steven @.>; Comment @.> Subject: [EXTERNAL] Re: [hpc/pavilion2] older series are not show (Issue #722)

Paul,

Thanks so much. Everything in the output here seems to be correct. Let me attach it to this comment, so that you can see for yourself, but as far as I can tell, the directories line up and match. New mystery: pav list series now shows series. Very strange. pav.txthttps://urldefense.com/v3/__https://github.com/hpc/pavilion2/files/13813370/pav.txt__;!!Bt8fGhp8LhKGRg!AngOimVYB2gfZga0ZDO9FWrkhANSwrt3J4bmqx-xT6HT9FYS-FnLNU676GR5OWieI2fKZPuvatKu0_IOY7hif6zrHA$

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/hpc/pavilion2/issues/722*issuecomment-1874462589__;Iw!!Bt8fGhp8LhKGRg!AngOimVYB2gfZga0ZDO9FWrkhANSwrt3J4bmqx-xT6HT9FYS-FnLNU676GR5OWieI2fKZPuvatKu0_IOY7jMC1anvA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMK6D5BIT4ENWAOGKQKQZMTYMRO4JAVCNFSM6AAAAABA7F5MFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGQ3DENJYHE__;!!Bt8fGhp8LhKGRg!AngOimVYB2gfZga0ZDO9FWrkhANSwrt3J4bmqx-xT6HT9FYS-FnLNU676GR5OWieI2fKZPuvatKu0_IOY7iXWrRyUw$. You are receiving this because you commented.Message ID: @.***>

curtisecombsjr commented 8 months ago

I tried to switch to the latest clone that I could, but unfortunately, our nodes aren't fully updated and i got errors, so i had to switch back. That being said, I do not think that this version is that old, might be a year-ish. Is there a way that I can check? It would be very difficult for us to upgrade at this moment, either way, sadly. A node OS update won't be for a while and even then we are not updating to the latest SP (Suse) here's my yaml. pav_config.txt

Paul-Ferrell commented 8 months ago

No problem. If you're using a git checkout of Pavilion, just send me the git hash. I can check out the version you're using and work from there.


From: curtisecombsjr @.> Sent: Tuesday, January 2, 2024 1:26 PM To: hpc/pavilion2 @.> Cc: Ferrell, Paul Steven @.>; Comment @.> Subject: [EXTERNAL] Re: [hpc/pavilion2] older series are not show (Issue #722)

I tried to switch to the latest clone that I could, but unfortunately, our nodes aren't fully updated and i got errors, so i had to switch back. That being said, I do not think that this version is that old, might be a year-ish. Is there a way that I can check? It would be very difficult for us to upgrade at this moment, either way, sadly. A node OS update won't be for a while and even then we are not updating to the latest SP (Suse) here's my yaml. pav_config.txthttps://urldefense.com/v3/__https://github.com/hpc/pavilion2/files/13813703/pav_config.txt__;!!Bt8fGhp8LhKGRg!C66X8hDVP-Ls3VPiEC4pWcGX8fd5Dbpzdf8syLuoUj0XjQl2uFvtDwy_RcCYF0X3Ie0_6TdHIVc498RhJP6B-9xEpA$

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/hpc/pavilion2/issues/722*issuecomment-1874510155__;Iw!!Bt8fGhp8LhKGRg!C66X8hDVP-Ls3VPiEC4pWcGX8fd5Dbpzdf8syLuoUj0XjQl2uFvtDwy_RcCYF0X3Ie0_6TdHIVc498RhJP6ww9CHAQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMK6D5BVEUUHRQMXJILOR73YMRUPFAVCNFSM6AAAAABA7F5MFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGUYTAMJVGU__;!!Bt8fGhp8LhKGRg!C66X8hDVP-Ls3VPiEC4pWcGX8fd5Dbpzdf8syLuoUj0XjQl2uFvtDwy_RcCYF0X3Ie0_6TdHIVc498RhJP4UcGDl3w$. You are receiving this because you commented.Message ID: @.***>

curtisecombsjr commented 8 months ago

Could this be helpful? It's been moved around and I wasn't the original person that installed it. Kinda just inherited it:

hpc-adm@clogin09:/apps/dev/pavilion/src/pavilion2> git log | cat | head -20
commit 6d6e359bc551e3981b14a5516a4028b96e2f3042
Author: Francine Lapid <55203623+francinelapid@users.noreply.github.com>
Date:   Tue Apr 6 11:17:44 2021 -0600

    fixed command-line overrides (#397)

    Co-authored-by: Paul Ferrell <51765748+Paul-Ferrell@users.noreply.github.com>

commit 07e9ebe47b5504132832c7564c97b813508337ca
Author: Francine Lapid <55203623+francinelapid@users.noreply.github.com>
Date:   Tue Apr 6 11:05:24 2021 -0600

    cancels entire series if there's a scheduler error (#399)

    * cancels entire series if there's a scheduler error

    * includes test id in output error

    * replaced by_sigterm parameter with message

hpc-adm@clogin09:/apps/dev/pavilion/src/pavilion2> git rev-parse --short HEAD
6d6e359b
hpc-adm@clogin09:/apps/dev/pavilion/src/pavilion2>
Paul-Ferrell commented 8 months ago

Yep, that will do it. Give me a bit - I've got other fires to handle, but I'll get to this today.

That version is quite old, and I think I vaguely remember this bug.


From: curtisecombsjr @.> Sent: Tuesday, January 2, 2024 1:52 PM To: hpc/pavilion2 @.> Cc: Ferrell, Paul Steven @.>; Comment @.> Subject: [EXTERNAL] Re: [hpc/pavilion2] older series are not show (Issue #722)

Could this be helpful? It's been moved around and I wasn't the original person that installed it. Kinda just inherited it:

@.:/apps/dev/pavilion/src/pavilion2> git log | cat | head -20 commit 6d6e359https://urldefense.com/v3/__https://github.com/hpc/pavilion2/commit/6d6e359bc551e3981b14a5516a4028b96e2f3042__;!!Bt8fGhp8LhKGRg!GLvTj_AE_Zyl9KmEtfOiOpOmAVD8q47i44IeNIz0nk0HeJfWy5T1qaw8JnLUthMJgX9Ypn-6nNhSvsmM1UeNvNTTJw$ Author: Francine Lapid @*.**@*.***> Date: Tue Apr 6 11:17:44 2021 -0600

fixed command-line overrides (#397)

Co-authored-by: Paul Ferrell @.***>

commit 07e9ebehttps://urldefense.com/v3/__https://github.com/hpc/pavilion2/commit/07e9ebe47b5504132832c7564c97b813508337ca__;!!Bt8fGhp8LhKGRg!GLvTj_AE_Zyl9KmEtfOiOpOmAVD8q47i44IeNIz0nk0HeJfWy5T1qaw8JnLUthMJgX9Ypn-6nNhSvsmM1UeeWvUBiQ$ Author: Francine Lapid @.**@.> Date: Tue Apr 6 11:05:24 2021 -0600

cancels entire series if there's a scheduler error (#399)

@.:/apps/dev/pavilion/src/pavilion2> git rev-parse --short HEAD 6d6e359https://urldefense.com/v3/__https://github.com/hpc/pavilion2/commit/6d6e359bc551e3981b14a5516a4028b96e2f3042__;!!Bt8fGhp8LhKGRg!GLvTj_AE_Zyl9KmEtfOiOpOmAVD8q47i44IeNIz0nk0HeJfWy5T1qaw8JnLUthMJgX9Ypn-6nNhSvsmM1UeNvNTTJw$ @.:/apps/dev/pavilion/src/pavilion2>`

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/hpc/pavilion2/issues/722*issuecomment-1874537285__;Iw!!Bt8fGhp8LhKGRg!GLvTj_AE_Zyl9KmEtfOiOpOmAVD8q47i44IeNIz0nk0HeJfWy5T1qaw8JnLUthMJgX9Ypn-6nNhSvsmM1UfpU4BVRg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMK6D5CGAKWZV5PGWGGWWVDYMRXRDAVCNFSM6AAAAABA7F5MFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGUZTOMRYGU__;!!Bt8fGhp8LhKGRg!GLvTj_AE_Zyl9KmEtfOiOpOmAVD8q47i44IeNIz0nk0HeJfWy5T1qaw8JnLUthMJgX9Ypn-6nNhSvsmM1UdGSqLCtw$. You are receiving this because you commented.Message ID: @.***>

curtisecombsjr commented 8 months ago

Awesome, thank you! And take your time. This is not seriously affecting us at the moment. We run pavilion tests for our burn-ins and it's working fine for those, it would just be a problem if we needed to look at the past results (which does happen, but not that often). Thanks again!