rose-compiler / rose

Developed at Lawrence Livermore National Laboratory (LLNL), ROSE is an open source compiler infrastructure to build source-to-source program transformation and analysis tools for large-scale C (C89 and C98), C++ (C++98 and C++11), UPC, Fortran (77/95/2003), OpenMP, Java, Python and PHP applications.
http://rosecompiler.org
Other
606 stars 131 forks source link

Does Outliner support loops under #pragma omp? #243

Closed icyclv closed 2 months ago

icyclv commented 2 months ago

Hi,

Thank you for developing this project; I believe it will be very helpful to me. I have recently started learning to use Rose and am looking to outline some loops in HPC projects (which are usually under #pragma omp parallel for directives). Currently, I am working through the outlineTutorial. However, I encountered an issue because #pragma rose_outline is followed by #pragma rather than a statement, which causes an exception.

Here is a simple example.

#include <iostream>

const int N = 1000000;

int main()
{
    double a[N];

    for (int i = 0; i < N; i++) {
        a[i] = i;
    }

    #pragma rose_outline
    #pragma omp parallel for
    for (int i = 0; i < N; i++) {
        a[i] = a[i] * a[i];
    }
    std::cout << "a[N-1] = " << a[N-1] << std::endl;
    return 0;
}

And the following error message:

=== Checking Outliner preconditions for 0x7faef3ba0de0:SgPragmaDeclaration 0x7faef3ba0de0:<SgPragmaDeclaration> [/home/ycchang/code/performance/rose/build/tutorial/outliner/test.cc:14]... ===
*** Statement must not be a declaration statement, unless it is a variable declaration. ***
    (Statement: 0x7faef3ba0de0:<SgPragmaDeclaration>) 0x7faef3ba0de0:<SgPragmaDeclaration> [/home/ycchang/code/performance/rose/build/tutorial/outliner/test.cc:14]
Outliner::preprocess() Input statement:/* Unparsing from the AST stmt(or partially from token stream) = SgPragmaDeclaration */ #pragma omp parallel for
 is not outlineable!
outlineTutorial[2812305] 1.41694s Rose[FATAL]: assertion failed:
outlineTutorial[2812305] 1.41696s Rose[FATAL]:   ../../../src/src/midend/programTransformation/astOutlining/Outliner.cc:522
outlineTutorial[2812305] 1.41697s Rose[FATAL]:   SgBasicBlock* Outliner::preprocess(SgStatement*)
outlineTutorial[2812305] 1.41697s Rose[FATAL]:   required: b

Could you suggest a better approach to handle this situation? Thank you very much for your help.

chunhualiao commented 2 months ago

check some flag like this: -rose:outline:select_omp_loop select OpenMP for loops for outlining, used for testing purpose.

There is an example outliner demonstrating the use of the relevant APIs. It does set an internal flag to operate on OpenMP loops. The source code was orignally located in tests/nonsmoke/functional/roseTests/astOutliningTests . However, we split out the tests directory to shrink the size of ROSE repo and no longer release the tests folder by default. I will discuss with the team how to share this translator.

https://en.wikibooks.org/wiki/ROSE_Compiler_Framework/outliner#Options

./outline --help | more

Outliner-specific options
Usage: outline [OPTION]... FILENAME...
Main operation mode:
        -rose:outline:preproc-only                     preprocessing only, no actual outlining
        -rose:outline:abstract_handle handle_string    using an abstract handle to specify an outlining target
        -rose:outline:parameter_wrapper                use an array of pointers to pack the variables to be passed
        -rose:outline:structure_wrapper                use a data structure to pack the variables to be passed
        -rose:outline:enable_classic                   use parameters directly in the outlined function body without transferring statement, C only
        -rose:outline:temp_variable                    use temp variables to reduce pointer dereferencing for the variables to be passed
        -rose:outline:enable_liveness                  use liveness analysis to reduce restoring statements if temp_variable is turned on
        -rose:outline:new_file                         use a new source file for the generated outlined function
        -rose:outline:output_path                      the path to store newly generated files for outlined functions, if requested by new_file. The original source file's path is used by default.
        -rose:outline:exclude_headers                  do not include any headers in the new file for outlined functions
        -rose:outline:use_dlopen                       use dlopen() to find the outlined functions saved in new files.It will turn on new_file and parameter_wrapper flags internally
        -rose:outline:copy_orig_file                   used with dlopen(): single lib source file copied from the entire original input file. All generated outlined functions are appended to the lib source file
        -rose:outline:enable_debug                     run outliner in a debugging mode
        -rose:outline:select_omp_loop                  select OpenMP for loops for outlining, used for testing purpose

The transformation of outlining is quite complex and nuanced. You have to decide

In the end, a customized outliner is needed for a given customer's special requirements.

icyclv commented 2 months ago

Hi.

Thank you for your help. I tried using the -rose:outline:select_omp_loop option, but I encountered some issues. Here is my test code:

const int N = 1000000;

int main()
{
    double a[N];

    for (int i = 0; i < N; i++) {
        a[i] = i;
    }
    int threads = 4;
    #pragma omp parallel for num_threads(threads)
    for (int i = 0; i < N; i++) {
        a[i] = a[i] * a[i];
    }
    return 0;
}

The command I used was:

./outlineTutorial -rose:outline:select_omp_loop ./test.cpp

After processing, the code appears as follows:

const int N = 1000000;
static void OUT__1__6714__(double (*ap__)[1000000]);

int main()
{
  double a[1000000];
  for (int i = 0; i < N; i++) {
    a[i] = i;
  }
  int threads = 4;
#pragma omp parallel  num_threads(threads)

#pragma omp for 
  OUT__1__6714__(&a);
  return 0;
}

static void OUT__1__6714__(double (*ap__)[1000000])
{
  double (&a)[1000000] =  *((double (*)[1000000])ap__);

#pragma omp parallel for num_threads (threads)
  for (int i = 0; i < N; i++) {
    a[i] = a[i] * a[i];
  }
}

I noticed two potential issues:

  1. The line #pragma omp parallel for num_threads(threads) is not removed from the main function.
  2. The threads variable is not passed to OUT__1__6714__.

Could you please advise if there might be an issue with how I am using the command? Or would creating a custom outliner be more appropriate for my specific needs?

Thank you again for your support!

chunhualiao commented 2 months ago

Most likely you have to tailor some existing tool or write your own to get what you need.

A more advanced example is the OpenMP lowering. You can see if it is closer to what you want: https://en.m.wikibooks.org/wiki/ROSE_Compiler_Framework/OpenMP_Support

icyclv commented 2 months ago

Got it. I will refer to the Outliner and OpenMP lowering code to try to meet my needs. Thank you very much for your help.