llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.38k stars 12.15k forks source link

Trivial memset optimization not applied to loops under -Oz (LoopIdiomRecognize) #50308

Open 2ac352c7-76df-4fdf-879e-561e76a0f473 opened 3 years ago

2ac352c7-76df-4fdf-879e-561e76a0f473 commented 3 years ago
Bugzilla Link 50964
Version 12.0
OS All
CC @fhahn,@TH3CHARLie

Extended Description

Issue description: https://llvm.discourse.group/t/loops-not-optimized-under-oz-that-should-have-been/3734/1

llvmbot commented 3 years ago

I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.

Eventually the optimization would still need to generate guards for it? (plz correct me if I'm wrong)

Yes, but I think for Oz we should initially only focus on the case where no loop remains after memset conversion Ah, I see. The expected optimization would be something like the below?

for (int i=start; i<end; ++i) { A[i] = CONST_VAL; // single store instruction }

and optimized to

if (start < end) { memset(A + start, CONST_VAL, sizeof(A[0)); }


-----
Then I guess the thing to figure out here will be - how does the pass check whether the store inst. is the only inst. inside the loop. (Is there API for it?
fhahn commented 3 years ago

I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.

Eventually the optimization would still need to generate guards for it? (plz correct me if I'm wrong)

Yes, but I think for Oz we should initially only focus on the case where no loop remains after memset conversion

llvmbot commented 3 years ago

I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.

Eventually the optimization would still need to generate guards for it? (plz correct me if I'm wrong)

Hi Florian and Yueh-Ting, I've tried this one month ago but didn't make some meaningful progress. So please feel free to pick it up.

I'm curious on how to solve the question and happy to contribute! Thank you Xuanda for the prompt reply.

TH3CHARLie commented 3 years ago

Hi Florian and Yueh-Ting, I've tried this one month ago but didn't make some meaningful progress. So please feel free to pick it up.

fhahn commented 3 years ago

I think Xuanda mentioned they might be looking into the issue, but that was a while ago so I am not sure about the status.

I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.

fhahn commented 3 years ago

Hi Florian, Do you intend to solve it by teaching LIR to perform "loop-rotate" here?

I won’t be able to work on this, but can help with the review.

llvmbot commented 3 years ago

Hi Florian, Do you intend to solve it by teaching LIR to perform "loop-rotate" here?

TH3CHARLie commented 3 years ago

Hi Florian! I am interested in working on this, I'll look it and will let you know if anything blocks me.

fhahn commented 3 years ago

LoopIdiomRecognize currently fails because the loop does not get rotated with -Oz. It should still be possible to convert loops as the one below to memcpy in LoopIdomRecognize, even if it is not rotated.

define void @&#8203;test(i8* noalias nonnull align 1 %start, i8* %end) {
entry:
  br label %loop.header

loop.header:
  %ptr.iv = phi i8* [ %start, %entry ], [ %ptr.iv.next, %loop.latch ]
  %_12.i = icmp eq i8* %ptr.iv, %end
  br i1 %_12.i, label %exit, label %loop.latch

loop.latch:
  %ptr.iv.next = getelementptr inbounds i8, i8* %ptr.iv, i64 1
  store i8 1, i8* %ptr.iv, align 1
  br label %loop.header

exit:
  ret void
}

https://llvm.godbolt.org/z/53Pb1hGzx

I'd be happy to provide help/guidance if anyone is interested in tackling this.

razetime commented 2 years ago

I'd be happy to provide help/guidance if anyone is interested in tackling this.

@fhahn Hello, I'm new to the project, and I'd like to solve this issue. I'm currently trying to replicate this bug in C++ so i can figure out what i should modify. This is what I've made to try and reproduce the issue(hope it is correct):

#include <vector>
#include <stdint.h>

using namespace std;
int main() {
    vector<uint8_t> v(8);
}
void memzero(vector<uint8_t> vec) {
    for(auto i : vec) {
        i = 0;
    }
}

Currently I only know that this has something to do with llvm-project\llvm\lib\Transforms\Scalar\LoopIdiomRecognize.cpp. I would like to get some pointers on where the changes should be implemented.

fhahn commented 2 years ago

I don't think you really need to have a C++ example. It should be sufficient to run opt -loop-idiom on the LLVM IR from https://llvm.godbolt.org/z/YrGqGhnE9 .

As a starting point, you can take a look at what loop-idiom does and why it fails for the example. Adding -debug should provide a bit of additional information, but using GDB/additional print statements may also be helpful.

razetime commented 2 years ago

Oh, good to know. What I'm currently doing is

opt -loop-idiom -debug OZtest.ll -o OZtestloopid.ll
llc OZtestloopid.ll
clang-14 OZtestloopid.s

But I get this error on the last command:

/usr/bin/ld: /lib/x86_64-linux-gnu/crt1.o: in function `_start':
(.text+0x24): undefined reference to `main'
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)

I assume I should be running gdb on the result of clang file.s and check what is happening there, but i'd like help with rectifying this error first.

fhahn commented 2 years ago

There is no need to run llc or clang after opt -loop-idiom.

You can just check the output generated by opt -loop-idiom -S. At the moment, there's no memset call because the transformation does not trigger. After a fix, loop-idiom should create a memset and it is sufficient to check the output after opt to verify.

razetime commented 2 years ago

in this area a comment says the following:

// The following transforms hoist stores/memsets into the loop pre-header.
// Give up if the loop has instructions that may throw.

is this relevant to the problem?

SimpleLoopSafetyInfo's docs say that it may give false positives to avoid complicated analysis.

razetime commented 2 years ago

Alright, runOnLoopBlock is the place where memset patterns are recognized.

razetime commented 2 years ago

@fhahn Using the given ir from godbolt and modifying the file here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L628

  for (BasicBlock *ExitBlock : ExitBlocks) {
    if (!DT->dominates(BB, ExitBlock)) {
      DEBUG_WITH_TYPE("raze", dbgs() << "appears to not dominate all exit blocks.\n");
      return false;
    }
  }

here the debug statement executes which i think means that memset optimization application is halted here early. I'd like to know if this is correct and that i'm testing in the right place.

EDIT: removing the loop lets the optimization happen

razetime commented 2 years ago

@fhahn after some Phabricator discussion and reviewing my existing ideas, I've come up with these ideas for a possible fix:

I'd like some pointers on whether these are feasible, and what other approaches i can try to solve this with.

fhahn commented 2 years ago

Thanks I responded to the review.

razetime commented 2 years ago

@fhahn thanks for the comment. Currently I am writing the following check:

  bool isSimpleUnrotated = true;
  SmallVector<BasicBlock *, 8> sm;
  CurLoop->getExitingBlocks(sm);

  auto head =CurLoop->getHeader();
  if(sm.size() == 1 && sm.front() == head && !CurLoop->isRotatedForm()) {

    for(Instruction &I: *head) {
      StoreInst *SI = dyn_cast<StoreInst>(&I);
      if (!SI)
        continue;
      switch (isLegalStore(SI))
      {
      case LegalStoreKind::None:
      case LegalStoreKind::Memset:
      case LegalStoreKind::MemsetPattern:
      case LegalStoreKind::Memcpy:
      case LegalStoreKind::UnorderedAtomicMemcpy:
      isSimpleUnrotated = true;
      break;
      default:
        isSimpleUnrotated = false;
        break;
      }
      if (!isSimpleUnrotated) break;
    }
  }

I'm not sure if i'm assigning false to IsSimpleUnrotated at the right places. Also, once this check is done, what would the rotation of the loop look like? will it require extra headers like LoopRotation.h?

hiraditya commented 1 year ago

Is anyone currently working on this?

llvmbot commented 1 year ago

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

1) Assign the issue to you. 2) Fix the issue locally. 3) Run the test suite locally. 3.1) Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests. 4) Create a git commit 5) Run git clang-format HEAD~1 to format your changes. 6) Submit the patch to Phabricator. 6.1) Detailed instructions can be found here

For more instructions on how to submit a patch to LLVM, see our documentation.

If you have any further questions about this issue, don't hesitate to ask via a comment on this Github issue.

@llvm/issue-subscribers-good-first-issue

hiraditya commented 1 year ago

Working on unrotated loop is probably not the best idea. LoopIdiom may be improved to work on specific cases (like the memset example above) but other optimizations will be skipped. Will it be reasonable to add a flag that allows duplication of loop-header at Oz?

hiraditya commented 9 months ago

Verified,l:'5',n:'0',o:'LLVM+IR+source+%231',t:'0')),k:46.09880919613934,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:opttrunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:llvm,libs:!(),options:'-passes%3D!'default%3COz%3E!'+-enable-loop-header-duplication',overrides:!(),selection:(endColumn:58,endLineNumber:5,positionColumn:58,positionLineNumber:5,selectionStartColumn:58,selectionStartLineNumber:5,startColumn:58,startLineNumber:5),source:1),l:'5',n:'0',o:'+opt+(trunk)+(Editor+%231)',t:'0')),header:(),k:53.90119080386066,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4) that it is working.

fhahn commented 9 months ago

I think we should only close the issue once the trivial memset gets removed by default when using -Oz with Clang/other frontends.