Open ValHayot opened 4 years ago
Hello @ValHayot, I think the pilot ended correctly. You are right, the pilot's side that executes units, the Pilot's agent, does not use the same environment as the one you are running from. It creates one on its own on the fly.
In your home directory, there should be a folder named radical.pilot.sandbox
. In there you will see a folder named ve.*
which is the virtual environment the agent is using. The rest of the folders are the pilot sessions you might have executed.
The agent creates its own environment because it assumes that it is on a different resource from the one you are launching your execution.
Inside rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000
there are several log files with extensions .log
, .out
, .err
. Would you mind uploading a zip file with them here?
Furthermore, in the pilot.0000
folder, you will see a unit.000000
. That is where your example application executed. If I remember correctly some error in the log files are not necessary fatal. Can you check in the unit and let us know if the application executed correctly?
Hey @iparask, sorry for the delay in replying! Here are the requested logs: pilot_logs.tar.gz
It seems like the application executed correctly (no fatal errors) and the correct output was produced. It just surprised me that the pilot timed out due to an ImportError
, especially considering that it's not the "normal" behaviour, as per https://radical-cybertools.github.io/radical-pilot/quick_start.html output and the fact that the only thing installed in my venv is radical pilot and its dependencies (there shouldn't be any unrelated libraries)
Hi @ValHayot : the timeout
message is misleading I think, it seems to happen also on successful and timely pilot termination. I opened a ticket in RP to get this fixed - for now, please ignore that message.
Thanks, Andre.
Note that this is fixed in the RP devel branch. This will be in the next release (this week).
Ok, good news! Following the issue reported here https://github.com/radical-cybertools/radical.pilot/issues/2035#issue-551034155, I figured out that it was unable to gather the results because OpenMPI was not installed. However, I'm still having issues running the example application.
My current issue is that RP times out waiting for the pilot:
grep of the logs:
However, libcloud is installed in my virtual environment:
and
I'm thinking the pilots are maybe not using the virtual environment that I've created. I'm trying to check if there's perhaps an environment variable that I forgot to set, such that RP pilots can use my virtual env.