learning-software-engineering / learning-software-engineering.github.io

MIT License
16 stars 206 forks source link

An Introduction to OpenMP #596

Closed realtmxi closed 3 months ago

realtmxi commented 3 months ago

Introduction to OpenMP

Brief Overview of parallel computing and OpenMP

img

img

OpenMP Programming

Hello World Example

Here is a basic example showing how to parallelize a hello world program. First, the sierial version:

#include <stdio.h>
int main() {
    printf( "Hello, World from just me!\n" );
    return 0;
}

To do this in parallel (have a series of threads print out a “Hello World!” statement), we would do the following:

#include <stdio.h>
#include <omp.h>  
int main() {
    int thread_id;
    #pragma omp parallel private(thread_id)
    {
        thread_id = omp_get_thread_num();
        printf( "Hello, World from thread %d!\n", thread_id );
    }
    return 0;
}

Compile and run OpenMP

# Compile
> gcc -fopenmp omphello.c -o omphello
# Run OpenMP programs
> export OMP_NUM_THREADS = 4 # set number of threads
> ./omphello
> time ./omphello # run and measure the time.

OpenMP General Code Structure

The snippet below shows the general structure of a C/C++ program using OpenMP.

#include <omp.h>
main(){
    int var1, var2, var3;
  Serial code
    ...

  // Beginning of parallel section.
  #pragma omp parallel private(var1, var2) shared(var3)
  {
    /* Parallel section executed by all threads */
    ...
    /* All threads join master thread and disband */
  }
  Resume serial code
  ...
    return 0;
}

When looking at this example you should notice a few things. First, we need to include the OpenMP header (omp.h). Second, we notice a few variables that are declared outside of the parallel region of the code. If these variables are used within the parallel region we will need to know if they are public or private variables. A variable being private means that every thread will have their own copy of this variable and that changes to that variable by one thread will not be seen by other threads. A variable defined within the parallel region will be private. On the other hand, a public variable is one that is shared between all of the threads and any changes made by one thread will be seen by all of the threads. Any read-only variables can be shared. Caution must be taken when when having multiple threads read and write to the same variable. Ensuring that this is done in the proper order avoids what are called “race conditions”.

Parallel For Loops

OpenMP can be used to easily parallelize for loops. This can only be done when the loop iterations are independent (ie. the running of one iteration of the loop does not depend on the result of previous iterations). Here is an example of for loop:

/* serial version */
for (i = 0; i < 25; i++) {
  printf("Foo");
}

/* parallel version */
#pragma omp parallel for
for (i = 0; i < 25; i++) {
  printf("Foo");
}

OpenMP Directives

In the previous sections examples of OpenMP directives have been given. The general format of these directives are:

#pragma omp directive-name [clause,..] newline

The scope of a directive is a block of statements surrounded by { }. A variety of clauses are available, including:

The reduction clause is used when the result of a parallel region is single value. For example, imagine we have an array of integers we would like the sum of. We can do this in parallel as follows:

int sum = 0;
#pragma omp parallel default(none) shared (n, x) \ 
  private (i) reduction(+ : sum) 
{
    for(i = 0; i < n; i++) 
        sum = sum + x(I);
}

Since sum is a shared variable, we must be careful to avoid race conditions surrounding it. Using a reduction clause ensures that the generated code avoids such situations.

Other Useful Tips

Synchronize threads in a parallel region using a barrier

Sometimes you may need all threads to wait at a certain point of your code before moving on. For example, if you build up a data structuire in parallel and then you want to perform some operations on said data structure, you need to ensure all of the threads have finished the first stage before the second can begin. To do this, enter a barrier into your code as follows:

#pragma omp barrier

Atomic & Critical Sections

Within a parallel region you may want to execute some code that only one thread should do at a time (eg. updating a shared variable. In these cases, you should use an atomic or critical region. These define blocks of code within a parallel region that will only be executed by one thread at a time. It is important to note that all threads will eventually run the code within the atomic/critical block. You should use an atomic block if you are executing a simple statement within the block. A critical region is used for lengthier blocks of code. Here is an example:

#pragma omp parallel shared(x) 
{
    . . .

    #pragma omp atomic
    {
        x++;
    }
    . . . 

    #pragma omp critical
    {
        lengthier code involving variable x
    }
}

Useful Functions and Environment Variables

There are a handful of useful functions you may want to use with respect to OpenMP. These include:

Explicitly setting the number of threads to be used in your code is not necessary. The same effect can be achieved by setting the environment variable OMP_NUM_THREADS.

Citation