szaghi / FUNDAL

Fortran UNified Device Acceleration Library
8 stars 2 forks source link

OpenACC module #2

Open ivan-pi opened 4 months ago

ivan-pi commented 4 months ago

Hi, I'm just curious if the OpenACC runtime routines aren't available via the openacc module, provided by the compiler vendors?

The module is described on page 89 of the following document: https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.2-final.pdf

ivan-pi commented 4 months ago

On a different note, it may be possible to save some effort templating for different types, kinds and ranks by using the Fortran 2018 assumed-rank array. Only on the Fortran side it is necessary to repeat the interface declarations.

Here is just an illustration (which surely fails some safety tests or compilers may not support yet). For simplicity I was using malloc and free, but the OpenACC equivalents should work similarly.

! test_alloc.f90
!
program test_alloc

use, intrinsic :: iso_c_binding, only: c_double, c_float, c_int
implicit none

interface acc_alloc
    subroutine acc_alloc_double(fptr,shape,stat) bind(c,name="f_acc_alloc")
        import c_double, c_int
        real(c_double), intent(out), pointer :: fptr(..)
        integer(c_int), intent(in) :: shape(:)
        integer(c_int), intent(out), optional :: stat
    end subroutine
    subroutine acc_alloc_float(fptr,shape,stat) bind(c,name="f_acc_alloc")
        import c_float, c_int
        real(c_float), intent(out), pointer :: fptr(..)
        integer(c_int), intent(in) :: shape(:)
        integer(c_int), intent(out), optional :: stat
    end subroutine
end interface

interface acc_dealloc
    subroutine acc_dealloc_double(fptr) bind(c,name="f_acc_dealloc")
        import c_double, c_int
        real(c_double), intent(inout), pointer :: fptr(..)
    end subroutine
    subroutine acc_dealloc_float(fptr) bind(c,name="f_acc_dealloc")
        import c_float, c_int
        real(c_float), intent(inout), pointer :: fptr(..)
    end subroutine
end interface

real(c_double), pointer :: a(:)
real(c_float), pointer :: b(:,:)

call acc_alloc(a, [8])
call acc_alloc(b, [2,3])

print *, shape(a), shape(b)
print *, size(a), size(b)

a = 1
print *, a

b = 2
print *, b

call acc_dealloc(a)
call acc_dealloc(b)

print *, associated(a)
print *, associated(b)

end program
// acc_alloc.c

#include <assert.h>
#include <stdio.h>

#include <ISO_Fortran_binding.h>

#include <stdlib.h> // malloc, free
//#include <openacc.h> // acc_malloc, acc_free

static int array_size(int rank, int dims[rank]) {

    // TODO: catch corner cases
    int sz = 1;
    for (int r = 0; r < rank; r++) {
        sz *= dims[r];
    }
    return sz;
}

void f_acc_alloc(CFI_cdesc_t *fptr, CFI_cdesc_t *shape, int *stat)
{
    if (fptr->rank != shape->dim[0].extent) {
        printf("dimension mismatch in arguments");
        abort();
    }

    printf("rank = %ld\n", fptr->rank);

    int *sdata = (int *) shape->base_addr;

    int nelems = array_size(fptr->rank, sdata);
    printf("size = %d\n",nelems);

    void *d_ptr = malloc( nelems * fptr->elem_len );
    if (d_ptr == NULL) {
        // Handle memory allocation error
        printf("Memory allocation error\n");
        abort();
    }

    CFI_CDESC_T(CFI_MAX_RANK) d_fptr;
    CFI_index_t extents[CFI_MAX_RANK];
    CFI_index_t lb[CFI_MAX_RANK];

    for (int r = 0; r < fptr->rank; r++) {
        lb[r] = 1;
        extents[r] = sdata[r];
    }

    int ierr;
    ierr = CFI_establish(
        (CFI_cdesc_t *) &d_fptr, d_ptr,
        CFI_attribute_pointer,
        fptr->type,
        fptr->elem_len,
        fptr->rank,
        extents);
    if (ierr != CFI_SUCCESS) {
        // Handle error
        printf("Error establishing device pointer array\n");
        abort();
    }

    ierr = CFI_setpointer(fptr, (CFI_cdesc_t *) &d_fptr,lb);
    if (ierr != CFI_SUCCESS) {
        // Handle error
        printf("Error setting Fortran pointer\n");
        abort();
    }

}

void f_acc_dealloc(CFI_cdesc_t *fptr) {
    if (fptr->base_addr != NULL) {
        free(fptr->base_addr);
        CFI_setpointer(fptr,NULL,NULL);  // nullifies the Fortran pointer
    }
}
giacrossi commented 4 months ago

Yes, we know about that: while Intel ifx compiler already supports assumed rank arrays, Nvidia nvfortran compiler is not fully supporting them.

ivan-pi commented 4 months ago

I've been playing with this solution before for mkl_malloc and mkl_free (from Intel).

NVIDIA is already shipping the "ISO_Fortran_binding.h" in recent releases, but I don't know how that carries over to the current nvfortran compiler (which descends from the PGI compiler), or if flang-new will be needed before Fortran 2018 C descriptors are fully supported.

ivan-pi commented 4 months ago

It says here (https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html#intr-lang-call) that the interoperability header is already supported. This would leave only the assumed-rank syntax missing.

szaghi commented 4 months ago

@ivan-pi

Thank you very much for the suggestions. Unfortunately, as @giacrossi said, it seems that nvfortran compiler has still some missing Fortran features and some oddities...

We surely use the assumed rank on the OpenMP backend with ifx.

Cheers