Parsing compiler name & version from Spack environment to include in the performance log file

kaanolgu commented 7 months ago

Attempt for issue https://github.com/ukri-excalibur/excalibur-tests/issues/251 Thanks: @giordano

giordano commented 7 months ago

Besides compilers, another specific bit of information that users may be interested in is the MPI implementation used, which can be fished out with spec["mpi"], with the caveat that not all packages depend on MPI, so this would need to check whether "mpi" in spec, so not to error out (e.g. spec["mpi"] if "mpi" in spec else "").

kaanolgu commented 6 months ago

for spack install babelstream%gcc@13.1.0 +std build_system=cmake std_submodel=data :

$ spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/cascadelake/default/STDDATABenchmark/spack_env/cascadelake python -c 'from spack import environment; d = environment.active_environment().spec_lists["specs"].specs[0].variants;print(d)'
+std build_system=cmake std_submodel=data

According to the manual dict should do the job:

$ spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/cascadelake/default/STDDATABenchmark/spack_env/cascadelake python -c 'from spack import environment; d = environment.active_environment().spec_lists["specs"].specs[0].variants;print(d.dict)'
{'std': BoolValuedVariant('std', True), 'build_system': AbstractVariant('build_system', 'cmake'), 'std_submodel': AbstractVariant('std_submodel', 'data')}

Keys :

$ spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/cascadelake/default/STDDATABenchmark/spack_env/cascadelake python -c 'from spack import environment; d = environment.active_environment().spec_lists["specs"].specs[0].variants;print(d.dict.keys())'
dict_keys(['std', 'build_system', 'std_submodel'])

From here we try :

$ spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/cascadelake/default/STDDATABenchmark/spack_env/cascadelake python -c 'from spack import environment; d = environment.active_environment().spec_lists["specs"].specs[0].variants;print(d.dict["std_submodel"])'
std_submodel=data

$ spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/cascadelake/default/STDDATABenchmark/spack_env/cascadelake python -c 'from spack import environment; d = environment.active_environment().spec_lists["specs"].specs[0].variants;print(d.dict["std_submodel"].value)'
('data',)

$ spack -e /lustre/home/br-kolgu/excalibur-tests-upstream/stage/isambard-macs/cascadelake/default/STDDATABenchmark/spack_env/cascadelake python -c 'from spack import environment; d = environment.active_environment().spec_lists["specs"].specs[0].variants;print(d.dict["std_submodel"].value[0])'
data

So we could form a loop to run through all the keys and export them as different columns for the perflog. But my question was during the meeting for the reframe_config.py file : https://github.com/ukri-excalibur/excalibur-tests/blob/323ae1142337563783d5de5ca04ac5ae2ad9d883/benchmarks/reframe_config.py#L733-L758

how do we know the names ? I have an idea to export all the spack_spec variants into a list and run through that list in the reframe_config.py file to print each column is it valid approach ?

kaanolgu commented 6 months ago

This modified code for the utils.py :

        spack_spec_keys = 'from spack import environment; list(environment.active_environment().spec_lists["specs"].specs[0].variants.dict.keys())'
        spack_spec_vals = 'from spack import environment; d = environment.active_environment().spec_lists["specs"].specs[0].variants.dict; l = list(d.keys()); values=[d[key].value[0] if isinstance(d[key].value,tuple) else d[key].value for key in l];print(values) '
        cmd_compiler_name = 'from spack import environment; print(environment.active_environment().spec_lists["specs"].specs[0].compiler.name)'
        cmd_compiler_version = 'from spack import environment; environment.active_environment().spec_lists["specs"].specs[0].compiler.versions[0]'
        self.postrun_cmds.append(f'echo "compiler_name: $(spack -e {self.build_system.environment} python -c \'{cmd_compiler_name}\')"')
        self.postrun_cmds.append(f'echo "compiler_version: $(spack -e {self.build_system.environment} python -c \'{cmd_compiler_version}\')"')
        self.postrun_cmds.append(f'echo "Spack_Spec keys : $(spack -e {self.build_system.environment} python -c \'{spack_spec_keys}\')"')
        self.postrun_cmds.append(f'echo "Spack_Spec vals : $(spack -e {self.build_system.environment} python -c \'{spack_spec_vals}\')"')

Prints out :

compiler_name: gcc
compiler_version: 13.1.0
Spack_Spec keys : ['std', 'build_system', 'std_submodel']
Spack_Spec vals : [True, 'cmake', 'ranges']

For : spack install babelstream%gcc@13.1.0 +std build_system=cmake std_submodel=data

I will check if sanity library function extractall is what we need for reading all the values from keys to pass to the reframe_config.py and values to be assigned for each.

I am open for any other suggestions if there is anything more direct approach than this

ilectra commented 6 months ago

You're almost there, @kaanolgu . The output I was thinking (for one package in the spec, see my comment above about many packages) is something like:

Spack_Spec : {'std':True, 'build_system':'cmake', 'std_submodel':'ranges'}

instead of the two separate lists spack_spec-keys and spack_spec-vals. I think a dict(zip(spack_spec-keys,spack_spec-vals)) should do it.

kaanolgu commented 6 months ago

@giordano @ilectra https://github.com/ukri-excalibur/excalibur-tests/pull/262/commits/8ba75611d79834c3a0d4d6083cce21c90e9345dc Tested with following : Input:

reframe -c benchmarks/apps/babelstream -r --tag cuda --system=isambard-macs:volta --setvar=num_cpus_per_task=40 -S build_locally=false -Sspack_spec='babelstream%gcc@9.2.0,12.1.0 +cuda cuda_arch=70,72'

Output :

spack_spec: {'cuda': True, 'cuda_arch': {'70', '72'}, 'compiler_name': 'gcc', 'compiler_version': {'12.1.0', '9.2.0'}}

Output (perflog) :

...|spack_spec|...
...|{'cuda': True, 'cuda_arch': {'70', '72'}, 'compiler_name': 'gcc', 'compiler_version': {'9.2.0', '12.1.0'}}|...

kaanolgu commented 6 months ago

Besides compilers, another specific bit of information that users may be interested in is the MPI implementation used, which can be fished out with spec["mpi"], with the caveat that not all packages depend on MPI, so this would need to check whether "mpi" in spec, so not to error out (e.g. spec["mpi"] if "mpi" in spec else "").

Just saw this, will add it in a new commit

ilectra commented 6 months ago

The added columns don't seem to require any additional changes for post-processing to function normally, so this looks good to me on that front.

Edit: Do note, however, that if you'd like to unpack any of your dict fields, please see how extra_resources and env_vars are handled in lines 119-125 of the read_perflog function in perflog_handler.py.

Yes, you have to unpack those the way Emily says. And once you add all the packages in the spec, it will be a bit more complicated than extra_resources and env_vars, as your values will not be just a dict anymore, but a nested dict of dicts (see my review comments and other comments above). Also, in order to make the postprocessing not dependent on perflog version, there should be a check if those columns exists, before it's attempted to unpack them.

kaanolgu commented 5 months ago

@giordano @ilectra https://github.com/ukri-excalibur/excalibur-tests/pull/262/commits/49f22867d29feafd1886ac7dea08bf3e9e79244b Fixes the issues we discussed and also includes the post-processing too

ilectra commented 5 months ago

@kaanolgu I've started tidying up the postprocessing bit of your previous commit, and ran into the same error as the one you got in the unit test. I'm currently looking into it, hopefully will fix soon...

kaanolgu commented 5 months ago

LGTM

ukri-excalibur / excalibur-tests

Parsing compiler name & version from Spack environment to include in the performance log file #262