gridcf / gct

Grid Community Toolkit
Apache License 2.0
47 stars 30 forks source link

GRAM Job failed because the job manager failed to open stdout #172

Closed longlong10086 closed 2 years ago

longlong10086 commented 3 years ago

I deployed a cluster with gct-6.2.

No matter on the slave node or the master node, I met the error "GRAM Job failed because the job manager failed to open stdout (error code 73) Details:", when executing the command " globus-job-run slave-node/jobmanager-fork-poll -np 1 /bin/hostnamee" .

image

Now I try parameter "-stdout /tmp/1.log", likely "globus-job-run slave-node/jobmanager-fork-poll -np 1 -stdout /tmp/1.log /bin/hostnamee" , I get right result in /tmp/1.log , but the error above still appears in the terminal .

How to fix ti ?

msalle commented 3 years ago

I noticed that in the email you send to the security list, there was an error msg="Error making datagram connection to Job Manager" reason="No such file or directory" path="/vol6/home/globus/softs/gct-6.2/var/lib/globus/gram_job_state/globus/test/fork.28be48cb.sock" and line155 in globus-job-manager-script.pl I'm not an expert on GRAM but perhaps the directory where that socket is expected doesn't exist or isn't writable for the job manager?

longlong10086 commented 3 years ago

Thanks for your reply.

Sometimes Ican find the file "/vol6/home/globus/softs/gct-6.2/var/lib/globus/gram_job_state/globus/test/fork.28be48cb.sock" ,sometimes can't , and  now  log  changes . 

log

event=gram.send_job.end level=WARN status=-3 errno=111 msg="Error making datagram connection to Job Manager" reason="Connectio    n refused" path="/vol6/home/globus/softs/gct-6.2/var/lib/globus/gram_job_state/globus/nscctj-1/fork.28be48cb.sock"

------------------ 原始邮件 ------------------ 发件人: "gridcf/gct" @.>; 发送时间: 2021年10月27日(星期三) 下午4:06 @.>; @.**@.>; 主题: Re: [gridcf/gct] GRAM Job failed because the job manager failed to open stdout (Issue #172)

I noticed that in the email you send to the security list, there was an error msg="Error making datagram connection to Job Manager" reason="No such file or directory" path="/vol6/home/globus/softs/gct-6.2/var/lib/globus/gram_job_state/globus/test/fork.28be48cb.sock" and line155 in globus-job-manager-script.pl I'm not an expert on GRAM but perhaps the directory where that socket is expected doesn't exist or isn't writable for the job manager?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

maarten-litmaath commented 3 years ago

Hi all, the matter in question might be understood from this Wiki page:

  https://wiki.egi.eu/wiki/Tools/Manuals/TS77

As that documentation was for older versions of Globus, it may not explain such issues with recent versions, though.