grycap / clues

CLUES: an energy management system for HPC Clusters and Cloud infrastructures.
http://www.grycap.upv.es/clues
GNU General Public License v3.0
24 stars 7 forks source link

send error (connection refused) #115

Open bio-computational-lab opened 2 years ago

bio-computational-lab commented 2 years ago

Hello clues crews,

I have installed and configured the clues service as explained in the instruction. I have running sge (version 8.1.9) and ipmitool (version 1.8.18), which both running fine on Centos7. When I start the service by "systemctl start cluesd.service", everything looks normal and "systemctl status cluesd.service"shows the service is active but when I look into the /var/log/clues2/clues2.log, I have the following error:

============= [CLUES]; INFO;2021-12-20 15:04:05,299;1640009045.300;not monitoring jobs due to configuration (var PERIOD_MONITORING_JOBS) root;ERROR;2021-12-20 15:04:05,306; Error in command "/opt/gridengine/bin/lx-amd64/qconf -shgrpl" root;ERROR;2021-12-20 15:04:05,306; Return code was: 1 root;ERROR;2021-12-20 15:04:05,306; Error output was: error: commlib error: got select error (Connection refused) unable to send message to qmaster using port 6444 on host "carbon.local": got send error

[PLUGIN-SGE];ERROR;2021-12-20 15:04:05,306;1640009045.306;could not get information about the hosts: root;ERROR;2021-12-20 15:04:05,312; Error in command "/opt/gridengine/bin/lx-amd64/qhost -xml -q" root;ERROR;2021-12-20 15:04:05,312; Return code was: 1 root;ERROR;2021-12-20 15:04:05,312; Error output was: error: commlib error: got select error (Connection refused) error: unable to send message to qmaster using port 6444 on host "carbon.local": got send error

Do you have any idea what is wrong or how can I solve this issue? Thanks in advance mahdi

micafer commented 2 years ago

Hi @bio-computational-lab,

If you execute the shown commands manually, do they work?

/opt/gridengine/bin/lx-amd64/qconf -shgrpl

/opt/gridengine/bin/lx-amd64/qhost -xml -q
bio-computational-lab commented 2 years ago

Hi Miguel, Yes both are fine: first print out the configured queues. e.g.: @EPYC @CP2K etc second print out the configured hosts (servers) in each queue in xml format. e.g.:

BIP 0 64 0 lx-amd64 64 2 64 64 0.00 251.3G 835.8M 4.0G 0.0
dealfonso commented 2 years ago

Hi,

Have you tried tu run them using the user that runs the clues daemon?

bio-computational-lab commented 2 years ago

I run them as root. I have installed clues as root and I have not changed the ownership of clues files. I enclosed the log file. That might help. clues2.log