HEJ Run inefficiency - Githubissues

scarlehoff / pyHepGrid

Tool for distributed computing management geared towards HEP applications.

GNU General Public License v3.0

6 stars 4 forks source link

HEJ Run inefficiency #42

Open JBlack93 opened 4 years ago

JBlack93 commented 4 years ago

Currently the runcard set up for HEJ is very rigid.

dictCard = { # first: -r second: -j                                                                             
  # we use -r as the name and -j as the runcard                                                                 
    # 'Wp2j_mw_13TeV-all': 'config_all',                                                                        
    'Wp2j_HT2_13TeV-all':'config_all'
}

The nature of the runcard set up is such that in the example given above, the runname (Wp2j_HT2_13TeV-all) is mapped to the config config_all. This is fine, except that in HEJ we often want to run several different config files for the purposes of analysis and results presentation. Given that the Sherpa part of the job is a significant percentage of the runtime, this leads to a huge inefficiency in generating FO input for each job.

JBlack93 commented 4 years ago

Proposal for a solution:

Rather than starting from calculating FO each time, running each config on top of it Ie:1 job would look like :

Sherpa -> HEJ config_all -> HEJ config_FKL -> HEJ config_subleading -> HEJ config_Matching

Rather than:4 Jobs with much redundancy:

Sherpa -> HEJ config_all Sherpa -> HEJ config_FKL Sherpa -> HEJ config_subleading Sherpa -> HEJ config_Matching

This could be achieved by pattern matching the config_* .yml files.