Closed ickc closed 5 months ago
Do you have more information on your test jobs, because I can't reproduce this. When I submit a non-interactive simple "echo $HOME" script to both universes then it prints the location of my actual home directory (not the htcondor scratch as it does for interactive jobs).
@rwf14f, I'm guessing you have getenv = true
in your ClassAd which passes the environment of the submission node to the job?
@rwf14f, did you get a chance to look into this? Thanks.
@rwf14f, is there recent change about this? I'm a bit confused about what I get today:
On wn3805341.tier2.hep.manchester.ac.uk
, it seems I can access /home/$USER
and can even have persistent storage there (i.e. files persist across jobs.) It seems some sort of mdraid is setup where md2
maps to /home
.
No change there, this has always been the case. The pool user accounts for the grid jobs are being created under /scratch, but yours are in /home because we currently use the same account creation mechanisms as for our local admin accounts. Files in there persist across jobs, but space is limited (/home has less than 100G). This is not a shared file system though, it's local to the machine, so files you put there are only available on that machine only and not on any of the others. This will go away when we set up a different authorisation/authentication mechanism as to what we currently do, so don't rely on it.
Thanks. That's a bit surprising. I think the main problem is unpredictability. Either of these are predictable behavior:
But the current situation is
Now once a job is submitted (say with no particular constraint on hostname), what can be expected from home is actually undefined, which means extra logic may be needed if HOME is used.
This behavior (that HOME is persistent per node) is not what I observed in the past either, which possibly is related to when those compute nodes are updated, or which nodes they are. What I observed is, including for at least a subset of the nodes currently (possibly related to the interactive flag), the HOME is the same as the scratch directory, which is guaranteed (I guess?) to be non-persistent across jobs.
Currently, interactive node has a HOME, pointing to the scratch directory. But non-interactive node, such as those sent to vanilla universe or parallel universe, has no HOME defined. Defining one manually seems to be overridden inside subprocesses (i.e. may be some system level rc scripts un-define it?).
This breaks some scripts that assumes the presence of HOME, e.g. mamba.
This then makes it difficult to submit jobs that worksas CI (Continuous Integration) or some other routine compilation work (recall that the login node cannot be used to compile things, as it does not have access to
module
s for example and has no gcc compiler.)Having a equivalent of
export HOME=$_CONDOR_SCRATCH_DIR
is good enough for our purpose. It should not cause (any more) confusions as the interactive node is already like this.