llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.3k stars 12.11k forks source link

[LV] missed optimization with pragma assume_safety #99734

Open vfdff opened 4 months ago

vfdff commented 4 months ago

The preceding pragma indicates to the compiler that the following loop contains no data dependencies between loop iterations that would prevent vectorization. The compiler might be able to use this information to vectorize a loop, where it would not typically be possible

void foo_noalias (int residue_i, int start, int end, int * restrict crd) {

pragma clang loop vectorize(assume_safety)

for (int atom_i = start[residue_i]; atom_i < end[residue_i]; atom_i++)
{
    crd[atom_i] = crd[atom_i] + 8;
}

}

Fangtangtang commented 5 days ago

Rpass remarks that the loop is not vectorized because it could not determine the number of loop iterations. I think this is due to its inability to decide whether there exists an alias between end and crd, and thus it cannot parallelize the iterations.

vfdff commented 4 days ago

Thanks for your comment.

As the #pragma clang loop vectorize(assume_safety) can assume there is no data dependencies for the following loop, does it mean there is no dependencies between end and crd, .ie they are not alias ?

Fangtangtang commented 3 days ago

I found that it can be vectorized if I load the value before the loop and use this value in loop condition.

    int end_value = end[residue_i];
    #pragma clang loop vectorize(assume_safety)
    for (int atom_i = start[residue_i]; atom_i < end_value; atom_i++)
    {
        crd[atom_i] = crd[atom_i] + 8;
    }

Also, if I update end_value with end[residue_i] in the loop body, it can not be vectorized as well.

    #pragma clang loop vectorize(assume_safety)
    for (int atom_i = start[residue_i]; atom_i < end_value; atom_i++)
    {
        crd[atom_i] = crd[atom_i] + 8;
        end_value = end[residue_i];
    }

My personal guess is that #pragma clang loop vectorize(assume_safety) indicates to the compiler that the loop contains no data dependencies between iterations that would prevent vectorization. However, there could be a potential dependency between end[residue_i] and crd[atom_i] within a single iteration.

You might want to refer to the source code for more detailed information. There are several return statements before IsAnnotatedParallel skips the memory dependence checks, as seen in this section of the LLVM project. It's likely that one of those checks failed in this case, preventing vectorization. I'm not very familiar with this part, so apologies for not exploring it further.