perfsonar / toolkit

perfSONAR Toolkit distribution environment scripts and GUI
Apache License 2.0
31 stars 6 forks source link

Deleted test continue to run #399

Open mfeit-internet2 opened 4 years ago

mfeit-internet2 commented 4 years ago

@lbsou commented on Wed Jun 26 2019

I did a bit of testing with two perfsonar server (same LAN) for the last 3 days and I came across a problem

Installed from : pS-Toolkit-4.1.6-CentOS7-FullInstall-x86_64-2019Feb20.iso on barebone servers. I disabled all firewall and IPV6. All service are in running state.

If I add a test to do a troughput task, it is added to the pending task and I can see it in the #pscheduler monitor / #pscheduler schedule, and I get the troughput result. If I remove the test (all of them for the purpose of the test), the task won't disappear and will continue to be schedule! *Even after a full reboot!

pscheduler troubleshoot 10.1.1.1

Performing basic troubleshooting of localhost and 10.1.1.1

localhost:

Checking path MTU... 65535 (Local) Checking for pScheduler... OK. Checking clock... OK. Idle test.... 13 seconds.... Checking archiving... OK.

10.249.143.30:

Checking path MTU... 1500+ Checking for pScheduler... OK. Checking clock... Unsynchronized (Both UI tell me NTP synced OK, so WHY?) Idle test.... 13 seconds.... Checking archiving... OK.

localhost and 10.1.1.1:

Checking path MTU... 1500+ Checking timekeeping... OK. Simple stream test.... 13 seconds.... OK.

pScheduler on both hosts appears to be functioning normally.

Server #1 image

image

Server #2 image

image


@mfeit-internet2 commented on Mon Mar 23 2020

If I add a test to do a troughput task, it is added to the pending task and I can see it in the #pscheduler monitor / #pscheduler schedule, and I get the troughput result. If I remove the test (all of them for the purpose of the test), the task won't disappear and will continue to be schedule!

This is more likely something wrong in the front end of the toolkit than pScheduler. pSConfig makes heavy use of the code that cancels tasks and doesn't have problems with it. I'm going to boot this over to the toolkit and see if the maintainers of the front end have anything to say about this.

Some additional comments:

*Even after a full reboot!

Everything pScheduler knows is stored in a database. If it's been scheduled, it persists and will be run unless cancelled, even across reboots. This differs from BWCTL, which operated in memory.

Checking clock... Unsynchronized (Both UI tell me NTP synced OK, so WHY?)

pScheduler calls ntp_adjtime(3) to query the clock status and reports what it returns. I'm pretty sure that's also the ultimate source of the data in the GUI. Is there a chance your system lost sync momentarily? (You can run pscheduler clock to peek at the clock as pScheduler sees it.)