Open gkaf89 opened 5 days ago
As far as I understand the parameters are expanded before reading the configuration file, and the resulting tests are filter with the contends of the configuration file.
Nope, configuration is very first thing that is being resolved, before even tests are ever loaded. You can have access to the actual partition/environment combinations at the parameter definition, but this is currently through an internal interface. The plan is to expose this and add examples in the documentation. This is how you can achieve your goal:
from reframe.core.runtime import valid_sysenv_comb
def admissible_omp_num_threads(valid_systems, valid_prog_environs):
for part, _ in valid_sysenv_comb(valid_systems, valid_prog_environs):
yield part.extras.get('admissible_omp_num_threads', []), part
class performance_test(rfm.RunOnlyRegressionTest):
valid_systems = ['...']
valid_prog_environs = ['...']
num_omp_threads = parameter(admissible_omp_num_threads(valid_systems, valid_prog_environs), fmt=lambda x: x[0])
@run_after('init')
def restrict_valid_systems(self):
self.valid_systems = admissible_omp_num_threads[1]
The valid_sysenv_comb
interprets the partition/environment constraints and gives you all the valid combinations for this test.
However, since the extras
value will likely be different for each of the valid partitions, you need to store this information and restrict in a post-init hook the particular test variant to its corresponding system.
Since this is a recurring pattern, e.g., wanting to parameterise a test over some other system info (such as sockets, number of GPUs), it's something we would like to expose in an easier way.
Thanks for the pointers!
The need to account for the partition complicates the process, but the valid_sysenv_comb
function exports all the necessary information. I am not sure how the process can be simplified. Here is an example of how I used the interface exposed by valid_sysenv_comb
.
site_configuration = {
'general': [
{
'use_login_shell': True,
}
],
'systems': [
{
'name': 'aion',
'descr': 'Aion cluster',
'hostnames': [r'aion-[0-9]{4}'],
'modules_system': 'lmod',
'partitions': [
{
'name': 'batch',
'descr': 'Aion batch partition',
'scheduler': 'slurm',
'launcher': 'srun',
'access': ['--partition=batch', '--qos=normal'],
'max_jobs': 8,
'environs': ['builtin', 'foss2023b'],
'extras' : {
'sockets_per_node' : 8,
'cores_per_socket' : 16,
'admissible_setups' : {
'omp_num_threads' : [1, 2, 4, 8, 16],
'num_nodes' : [1, 2, 4, 8, 16],
},
},
},
],
},
{
'name': 'iris',
'descr': 'Iris cluster',
'hostnames': [r'iris-[0-9]{3}'],
'modules_system': 'lmod',
'partitions': [
{
'name': 'batch',
'descr': 'Iris batch partition',
'scheduler': 'slurm',
'launcher': 'srun',
'access': ['--partition=batch', '--qos=normal'],
'max_jobs': 8,
'environs': ['builtin', 'foss2023b'],
'extras' : {
'sockets_per_node' : 2,
'cores_per_socket' : 14,
'admissible_setups' : {
'omp_num_threads' : [1, 7, 14],
'num_nodes' : [1, 2, 4, 8, 16],
},
},
},
],
},
],
...
}
class PartitionExtraProperty:
def __init__(self, part, val):
self.partition = part
self.value = val
def __str__(self):
return f"{self.value}"
def parametrize_system_partition_property(
valid_systems,
valid_prog_environs,
get_system_partition_property
):
partition_extra_properties = []
for part in valid_sysenv_comb(valid_systems, valid_prog_environs):
prop = get_system_partition_property(part)
partition_extra_properties.append( PartitionExtraProperty(part.name, prop) )
return partition_extra_properties
def expand_partition_property_list( partition_extra_properties_list, reduce_list ):
partition_property_list = []
for partition_extra_property in partition_extra_properties_list:
partition = partition_extra_property.partition
value_list = partition_extra_property.value
reduced_list = reduce_list(value_list)
for prop in reduced_list:
yield PartitionExtraProperty( partition, prop)
def get_admissible_omp_num_threads(partition):
return partition.extras.get('admissible_setups', None).get('omp_num_threads', [])
def get_admissible_num_nodes(partition):
return partition.extras.get('admissible_setups', None).get('num_nodes', [])
class performance_test(rfm.RunOnlyRegressionTest):
valid_systems = ['*']
valid_prog_environs = ['+openmp +mpi']
test_case = parameter()
test_type = parameter()
num_nodes = parameter()
cpus_per_task = parameter()
partition_num_nodes = parametrize_system_partition_property(
valid_systems,
valid_prog_environs,
get_admissible_num_nodes
)
partition_cpus_per_task = parametrize_system_partition_property(
valid_systems,
valid_prog_environs,
get_admissible_omp_num_threads
)
@run_after('init')
def restrict_valid_systems(self):
valid_partitions = { self.num_nodes.partition } & { self.cpus_per_task.partition }
self.valid_systems = [ f'*:{partition}' for partition in valid_partitions ]
self.num_nodes = self.num_nodes.value
self.cpus_per_task = self.cpus_per_task.value
...
@rfm.simple_test
class problem_size_scaling_test(performance_test):
test_type = parameter( ['opt', 'dmc', 'vmc'] )
test_case = parameter( ['W1', 'W5', 'W10', 'W15', 'W20', 'W25', 'W30'] )
num_nodes = parameter(
expand_partition_property_list(
performance_test.partition_num_nodes,
lambda x : x
)
)
cpus_per_task = parameter(
expand_partition_property_list(
performance_test.partition_cpus_per_task,
lambda x : [max(x)]
)
)
@rfm.simple_test
class ompmpi_ratio_test(performance_test):
test_type = parameter( ['vmc'] )
test_case = parameter( ['W1', 'W5', 'W10', 'W15', 'W20', 'W25', 'W30'] )
num_nodes = parameter(
expand_partition_property_list(
performance_test.partition_num_nodes,
lambda x : x
)
)
cpus_per_task = parameter(
expand_partition_property_list(
performance_test.partition_cpus_per_task,
lambda x : x
)
)
For the test parameters, I am abusing the system a bit by resetting the value of the parameter in restrict_valid_systems
to remove the information about the partition and keep only the value of interest. I noticed that with this method setting the fmt
entry of parameter
results in errors; it seems that fmt
is called before and after the @run_after('init')
hook, so it will have to handle both formats. I chose to created a class PartitionExtraProperty
to print the values of the parameters with its __str__
function instead of handling multiple types in fmt
.
When defining tests it can be useful to have access to the contents of the setup file. Consider for instance the following site configuration.
We want to configure a test for the performance of some software based on the number of OpenMP threads:
As far as I understand the parameters are expanded before reading the configuration file, and the resulting tests are filter with the contends of the configuration file. Could we somehow use the contends of the configuration file earlier, for instance by setting a callback in the parameter definition?