icecube / flarestack

Unbinned likelihood analysis code for astroparticle physics datasets
https://flarestack.readthedocs.io/en/latest/?badge=latest
MIT License
8 stars 7 forks source link

Fail to retrieve jobids when calling wait_for_cluster() method #283

Closed sathanas31 closed 1 year ago

sathanas31 commented 1 year ago

Describe the bug I'm calling the analyse() method in a loop and pass it as a jobid variable that is appended into a list, which in turn is fed to wait_for_cluster() method outside the loop. The jobid var is None, although is passed correclty into self.job_id in the submit_cluster module, which leads to the wait_for_job() log message "No Job ID!".

The alternative is that analyse() isn't passed into a variable when called inside the loop, and instead append the submitter objects into a list. Then looping over the submitters list and call wait_for_job() works just fine.

I'm aware that analyse() doesn't return the jobid but just submits the jobs, so it's more of a question how can you use the wait_for_cluster() correctly if jobids cannot be retrieved from analyse?

To Reproduce Code snippet:

for sindec in sindecs:
    # Create single-source catalogues for each dec and return their paths:
    cat_path = ps_catalogue_name(sindec)
    # subdir for each dec output:
    subname = name + "/sindec_" + "{0:.2f}".format(sindec) + "/"
    # convert sensitivity flux for each dec to a scale number:
    scale = flux_to_k(reference_sensitivity(sindec))[0] * 3
    # the MinimizationHandler dict:
    mh_dict = {
        "name": subname,
        "mh_name": "fixed_weights",
        "dataset": ps_v003_p02,
        "catalogue": cat_path,
        "inj_dict": inj_kwargs,
        "llh_dict": llh_kwargs,
        "scale": scale,
        "n_trials": 500,
        "n_steps": 15,
    }
    # instantiate Submitter class
    submitter = WIPACSubmitter.get_submitter(
    mh_dict=mh_dict,                         
    use_cluster=True,                       
    n_cpu=1,                                
    trials_per_task=100,
    ram_per_core=8000,
    remove_old_logs=True,
    )
    # submit mh_dict to NPX cluster
    job_id = submitter.analyse() # = None

    job_ids.append(job_id) 

# wait 'til all jobs are done 
submitter.wait_for_cluster(job_ids)

Expected behavior job_id != None

Additional context The working alternative is:

for sindec in sindecs:
    [....]
    submitter.analyse()
    submitters.append(submitter)

for submitter in submitters:
    submitter.wait_for_job()
JannisNe commented 1 year ago

Submitter.analyse() is actually not supposed to return anything, see this. If you want to save the Job IDs you can use submitter.job_id which is set here during the call to Submitter.analyze if you use the cluster. So you could do

job_ids.append(submitter.job_id) 

if you wanted.

I would say that your working alternative is already a better solution to the problem. Is there anything wrong with that alternative?

sathanas31 commented 1 year ago

Nothing wrong with the alternative, I was just wondering how could I use the wait_for_cluster() method instead. I opened it as an issue, but it's more of a question Thx for the tip :)

JannisNe commented 1 year ago

BTW, you can use Submitter.wait_for_cluster without any arguments. In that case, it will just wait for all jobs that you have running.