riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

Make `job_file_dir` a parameter of `HTCondorWorkflow` #180

Closed EthanMarx closed 1 month ago

EthanMarx commented 1 month ago

Description

Currently, the job_file_dir is set from the law config. I have a use case where multiple HTCondorWorkflows may be running simultaneously, in which case I noticed that job files were being overwritten by one another.

The workaround I found was to hack in to the htcondor_create_job_file_factory hook:

class MyHTCWorkflow(HTCondorWorkflow):

def htcondor_create_job_file_factory(self, **kwargs):
        # set the job file dir to proper location
        kwargs["dir"] = self.job_file_dir
        return super().htcondor_create_job_file_factory(**kwargs) 

So that I can set the job_file_dir in a task dependent way.

This solution works fine, but this seems like a common enough use case that it should be more obvious how / where this can be set. I propose making this a parameter of the HTCondorWorkflow that can default to reading from the config if left unset.

Can put together a PR if this is something that is desirable.

riga commented 1 month ago

Hi @EthanMarx ,

thanks for opening the issue!

What do your job file directories look like? Usually, there is a temporary directory that is created per job submission inside that job_file_dir. It's controlled by the job:: job_file_dir_mkdtemp setting which should default to True (see https://law.readthedocs.io/en/latest/config.html#job). Is that the case for you?

EthanMarx commented 1 month ago

Ah! I've been setting that to False, but only b/c I was misunderstanding what it does. Thanks for pointing that out - should resolve my issue!

EthanMarx commented 1 month ago

Can I ask about the naming convention for the sub directories?

riga commented 1 month ago

Can I ask about the naming convention for the sub directories?

It's using the default of tempfile.mkdtemp.

EthanMarx commented 1 month ago

Thanks. Would it make sense to name these directories based off something like the task name?

riga commented 1 month ago

Yes, that would be good! I will change the option to also accept strings that will support a couple template variables like tmp_XXXX_{task_id}.

I should have time to work on this tomorrow.