hiker / fab

Flexible build system for scientific software
https://metomi.github.io/fab/
Other
1 stars 0 forks source link

Artefact Improvements #1

Closed hiker closed 1 month ago

hiker commented 6 months ago

The artefact storage has inconsistent information, for example, when pre-processing Fortran files, the F90 files will be processed and added to the artefact as preprocessed_fortran. Then the same code will copy all .f90 files from all_source into the build directory, but they are NOT added to the artefact.

The preprocess_x90 will pre-process all X90 files, but does NOT copy the .x90 files into the build tree.

This then results in e.g. analyses doing:

DEFAULT_SOURCE_GETTER = CollectionConcat([
    SuffixFilter('all_source', '.f90'),
    'preprocessed_c',
    'preprocessed_fortran',

    # todo: this is lfric stuff so might be better placed elsewhere
    SuffixFilter('psyclone_output', '.f90'),
    'preprocessed_psyclone',  # todo: this is no longer a collection, remove
    'configurator_output',

So it needs to access all_source to find files that were not pre-processed in the first place - and it accesses these files from the all_source artifact (i.e. the source directory).

Compilation then is doing:

DEFAULT_SOURCE_GETTER = FilterBuildTrees(suffix='.f90')

So it takes the files from the build_output, which could be different from the ones contained in source.

Additionally, it is hard to introduce a new phase if the various steps are so dependent on where files are without using the artefact store.

As a suggested fix I will try to introduce two new artefact store categories: "all_fortran", "all_c", "all_x90", with the idea that these language dependent steps will just have one (or two) fixed artefacts to look at.

This will also make it much easier to introduce new steps

hiker commented 6 months ago

The analysis situation can actually be more complicated:

  1. While running PSyclone, fab will analyse potential kernels (all files in a kernel search directory), which are taken from the build_output directory (this is specified in the PSyclone step by the user). This works, because the pre-processing step after handling.F90 copies the .f90 files as well into the output directory!
  2. While the analysis will pick up the .f90 files from all_source (since the artefact storage does not contain the details of the copied .f90 files), i.e. the potentially wrong file - it will actually detect the cached result (since the hash is the same) and use it, i.e. the analysis is based on build_output, not all_source, even though the debug output states it is . BUT, if an additional step would modify the file in build_output after running psyclone, this would not be detected:
    1. running psyclone will create hashed analysis information based on build_output
    2. build_output is modified by an additional step
    3. analysis will compute hash based on all_source, which is the same as the one from build_output in the psyclone step
    4. analysis will use the old cached analysis file (since it has the same hash), not the modified file

This at least indicates that a new step that potentially modified files must be inserted before psyclone is run(??)