libocca / occa

Portable and vendor neutral framework for parallel programming on heterogeneous platforms.
https://libocca.org
MIT License
389 stars 82 forks source link

Regression with float4 type in Occa 1.6.0 #710

Open atillack opened 1 year ago

atillack commented 1 year ago

When using float4 data types in Occa 1.6.0 I get errors like the following:


---[ Error ]--------------------------------------------------------------------
    File     : /Users/andreas/sources/occa-1.6.0/src/occa/internal/modes/serial/device.cpp
    Line     : 383
    Function : operator()
    Message  : Error compiling [addVectors], Command: [clang++ -fopenmp -L/usr/local/opt/llvm/lib -L/usr/local/lib -std=c++11 -fPIC -shared /Users/andreas/.occa/cache/8820b6b2823c30ac/addVectors.source.cpp -o /Users/andreas/.occa/cache/8820b6b2823c30ac/7bef5cf6cd781157.binary  2>&1]
               Output:

               /Users/andreas/.occa/cache/8820b6b2823c30ac/addVectors.source.cpp:22:9: error: use of undeclared identifier 'float4'; did you mean 'float'?
                       float4 f;
                       ^~~~~~
                       float

The above can be reproduced on macOS and Linux with a small modification to the 05_custom_types Occa code:

struct myFloat {
  float value;
};

typedef struct myFloat2_t {
  float x, y;
} myFloat2;

typedef struct {
  float values[4];
} myFloat4;

@kernel void addVectors(const int entries,
                        const myFloat *a,
                        const myFloat2 *b,
                        myFloat4 *ab) {
  for (int i = 0; i < (entries / 4); ++i; @tile(16, @outer, @inner)) {
    float4 f;
    f.x = a[4*i + 0].value + b[2*i + 0].x;
    f.y = a[4*i + 1].value + b[2*i + 0].y;
    f.z = a[4*i + 2].value + b[2*i + 1].x;
    f.w = a[4*i + 3].value + b[2*i + 1].y;
    ab[i].values[0] = f.x;
    ab[i].values[1] = f.y;
    ab[i].values[2] = f.z;
    ab[i].values[3] = f.w;
  }
}

The above works fine in previous Occa versions.

atillack commented 6 months ago

Turns out a related problem for our project is that OpenMP/Serial bails with the same problems - this was caused by PR #675 changing the previous default behavior. @noelchalmers comment in the PR about this breaking things and it needing to be communicated is correct ;-)

Setting kernel/include_occa and serial/include_std to true as before #675 enables compilation of the above code but it segfaults on my machine - fortunately, using these settings in our project solved the remaining problems however.