rcthomas / resist

BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Let us have macros that enable high bandwidth memory management and padding. #1

Closed rcthomas closed 7 years ago

rcthomas commented 7 years ago

We want to be able to compile with or without high-bandwidth memory specializations and experiment with padding. The place to put this is probably in resist-memory.h.

Once these are defined we should carefully replace all the memory management commands in the code with our special versions.

bcfriesen commented 7 years ago

Regarding alignment, here are a few approaches I've seen:

  1. HPGMG: malloc a base array base_array with the size you actually need plus a cache line's worth (64 bytes on KNL; in the example below, 0x3f in hex is 63 in base-10) of extra padding, and then shift the pointer until it falls on a cache line boundary:

    data_ptr = base_array;
    while( (uint64_t)(data_ptr & 0x3f ){data_ptr++;}
    return data_ptr;

    (see https://bitbucket.org/hpgmg/hpgmg/src/a2f19abc0147a438c6ec4e41262e5a2a696e119e/finite-volume/source/level.c?at=master&fileviewer=file-view-default around line 1178)

  2. Do whatever witchcraft FFTW does:

static void *our_malloc(size_t n)
{
     void *p0, *p;
     if (!(p0 = malloc(n + MIN_ALIGNMENT))) return (void *) 0;
     p = (void *) (((uintptr_t) p0 + MIN_ALIGNMENT) & (~((uintptr_t) (MIN_ALIGNMENT - 1))));
     *((void **) p - 1) = p0;
     return p;
}
  1. Check for the existence of other non-standard alignment tools like memalign, posix_memalign, etc., and only fall back on one of the above homebrew solutions if none of those exist. This is what FFTW actually does (see https://github.com/FFTW/fftw3/blob/master/kernel/kalloc.c).
bcfriesen commented 7 years ago

I think what FFTW and what HPGMG do are more or less the same thing. But I'm not good enough in C to verify that.

bcfriesen commented 7 years ago

It turns out we cannot use the code from FFTW because the developers of that code licensed it with GPLv2, which LBNL does not like. So we will have to use HPGMG's sorcery.

bcfriesen commented 7 years ago

An implementation question about this: the issue says "macros" but how do you feel about regular functions with pre-processing? i.e., do you prefer this:

#ifdef USE_HBM
#define rs_malloc(x) hbw_malloc(x)
#else
#define rs_malloc(x) malloc(x)
#endif

or this:

void * rs_malloc (size_t x) {
  void *x;
#ifdef USE_HBM
  hbw_malloc(x);
#else
  malloc(x);
#endif
  return x;
}
rcthomas commented 7 years ago

I think the second is preferable.

bcfriesen commented 7 years ago

I agree. OK thanks, will send you something soon.