radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Issues running radical.pilot==1.52.1 on Amarel HPC #3173

Closed AlexReedy closed 3 months ago

AlexReedy commented 5 months ago

Hey all, I have been attempting to run FACTS on Amarel today and have been encountering some issues: It seem that when attempting to execute the radical pilot agent that a network ping fails

--------------------------------------------------------------------------------------------------------------------------------
# Launching radical-pilot-agent 
ntphost: 46.101.140.169
ping: socket: Operation not permitted
missing 'src' -- prepare env from current env
agent 29610 is gone
agent 29610 is final
agent 29610 is final (1)
---------------------------------------------------------------------------------------------------------------------------------

unsure if this is an issue with my environment, RCT, or some security settings after the most recent Amarel update. I can provide my sandbox files upon request but do not wish to post them here.

Best, Alex

andre-merzky commented 5 months ago

Hi @AlexReedy - thanks for the report! The ping error is not fatal, that is unlikely to be an issue. Sharing the pilot sandbox would be useful - either here or via slack. Thanks!

AlexReedy commented 5 months ago

Hey @andre-merzky thanks! I'll send the sandboxes over via slack

andre-merzky commented 4 months ago

@AlexReedy : did I understand your last message correctly that this issue was resolved?

AlexReedy commented 4 months ago

Hey @andre-merzky it has not been resolved unfortunately. I sent over the details on slack but it seems that I still get an AssertionError from the pilot that the process is requesting more CPUS than are available.

andre-merzky commented 4 months ago

@AlexReedy : I had to open an issue with amarel support and am waiting now for their response.

AlexReedy commented 4 months ago

Hey @andre-merzky, it looks like the run issue was resolved in the 1.6 versions of the RCT stack!

andre-merzky commented 4 months ago

Oh, so things are back in working order for you?

mturilli commented 4 months ago

@AlexReedy can you confirm this is solved and we can close the ticket?

andre-merzky commented 3 months ago

Can be closed according to https://github.com/radical-collaboration/facts/issues/334#issuecomment-2142512524