Open 2ac352c7-76df-4fdf-879e-561e76a0f473 opened 3 years ago
I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.
Eventually the optimization would still need to generate guards for it? (plz correct me if I'm wrong)
Yes, but I think for Oz we should initially only focus on the case where no loop remains after memset conversion Ah, I see. The expected optimization would be something like the below?
for (int i=start; i<end; ++i) { A[i] = CONST_VAL; // single store instruction }
and optimized to
if (start < end) { memset(A + start, CONST_VAL, sizeof(A[0)); }
-----
Then I guess the thing to figure out here will be - how does the pass check whether the store inst. is the only inst. inside the loop. (Is there API for it?
I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.
Eventually the optimization would still need to generate guards for it? (plz correct me if I'm wrong)
Yes, but I think for Oz we should initially only focus on the case where no loop remains after memset conversion
I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.
Eventually the optimization would still need to generate guards for it? (plz correct me if I'm wrong)
Hi Florian and Yueh-Ting, I've tried this one month ago but didn't make some meaningful progress. So please feel free to pick it up.
I'm curious on how to solve the question and happy to contribute! Thank you Xuanda for the prompt reply.
Hi Florian and Yueh-Ting, I've tried this one month ago but didn't make some meaningful progress. So please feel free to pick it up.
I think Xuanda mentioned they might be looking into the issue, but that was a while ago so I am not sure about the status.
I don’t think we need to rotate, we should be able to do the analysis on unrotated loops as well.
Hi Florian, Do you intend to solve it by teaching LIR to perform "loop-rotate" here?
I won’t be able to work on this, but can help with the review.
Hi Florian, Do you intend to solve it by teaching LIR to perform "loop-rotate" here?
Hi Florian! I am interested in working on this, I'll look it and will let you know if anything blocks me.
LoopIdiomRecognize currently fails because the loop does not get rotated with -Oz. It should still be possible to convert loops as the one below to memcpy
in LoopIdomRecognize, even if it is not rotated.
define void @​test(i8* noalias nonnull align 1 %start, i8* %end) {
entry:
br label %loop.header
loop.header:
%ptr.iv = phi i8* [ %start, %entry ], [ %ptr.iv.next, %loop.latch ]
%_12.i = icmp eq i8* %ptr.iv, %end
br i1 %_12.i, label %exit, label %loop.latch
loop.latch:
%ptr.iv.next = getelementptr inbounds i8, i8* %ptr.iv, i64 1
store i8 1, i8* %ptr.iv, align 1
br label %loop.header
exit:
ret void
}
https://llvm.godbolt.org/z/53Pb1hGzx
I'd be happy to provide help/guidance if anyone is interested in tackling this.
I'd be happy to provide help/guidance if anyone is interested in tackling this.
@fhahn Hello, I'm new to the project, and I'd like to solve this issue. I'm currently trying to replicate this bug in C++ so i can figure out what i should modify. This is what I've made to try and reproduce the issue(hope it is correct):
#include <vector>
#include <stdint.h>
using namespace std;
int main() {
vector<uint8_t> v(8);
}
void memzero(vector<uint8_t> vec) {
for(auto i : vec) {
i = 0;
}
}
Currently I only know that this has something to do with llvm-project\llvm\lib\Transforms\Scalar\LoopIdiomRecognize.cpp
. I would like to get some pointers on where the changes should be implemented.
I don't think you really need to have a C++ example. It should be sufficient to run opt -loop-idiom
on the LLVM IR from https://llvm.godbolt.org/z/YrGqGhnE9 .
As a starting point, you can take a look at what loop-idiom does and why it fails for the example. Adding -debug
should provide a bit of additional information, but using GDB/additional print statements may also be helpful.
Oh, good to know. What I'm currently doing is
opt -loop-idiom -debug OZtest.ll -o OZtestloopid.ll
llc OZtestloopid.ll
clang-14 OZtestloopid.s
But I get this error on the last command:
/usr/bin/ld: /lib/x86_64-linux-gnu/crt1.o: in function `_start':
(.text+0x24): undefined reference to `main'
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
I assume I should be running gdb on the result of clang file.s
and check what is happening there, but i'd like help with rectifying this error first.
There is no need to run llc
or clang
after opt -loop-idiom
.
You can just check the output generated by opt -loop-idiom -S
. At the moment, there's no memset
call because the transformation does not trigger. After a fix, loop-idiom
should create a memset
and it is sufficient to check the output after opt
to verify.
in this area a comment says the following:
// The following transforms hoist stores/memsets into the loop pre-header.
// Give up if the loop has instructions that may throw.
is this relevant to the problem?
SimpleLoopSafetyInfo's docs say that it may give false positives to avoid complicated analysis.
Alright, runOnLoopBlock is the place where memset patterns are recognized.
@fhahn Using the given ir from godbolt and modifying the file here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L628
for (BasicBlock *ExitBlock : ExitBlocks) {
if (!DT->dominates(BB, ExitBlock)) {
DEBUG_WITH_TYPE("raze", dbgs() << "appears to not dominate all exit blocks.\n");
return false;
}
}
here the debug statement executes which i think means that memset optimization application is halted here early. I'd like to know if this is correct and that i'm testing in the right place.
EDIT: removing the loop lets the optimization happen
@fhahn after some Phabricator discussion and reviewing my existing ideas, I've come up with these ideas for a possible fix:
RunOnCountableLoop
if the loop is unrotated-Oz
to add the loop-rotate
pass before loop-idiom
based on code analysisI'd like some pointers on whether these are feasible, and what other approaches i can try to solve this with.
Thanks I responded to the review.
@fhahn thanks for the comment. Currently I am writing the following check:
bool isSimpleUnrotated = true;
SmallVector<BasicBlock *, 8> sm;
CurLoop->getExitingBlocks(sm);
auto head =CurLoop->getHeader();
if(sm.size() == 1 && sm.front() == head && !CurLoop->isRotatedForm()) {
for(Instruction &I: *head) {
StoreInst *SI = dyn_cast<StoreInst>(&I);
if (!SI)
continue;
switch (isLegalStore(SI))
{
case LegalStoreKind::None:
case LegalStoreKind::Memset:
case LegalStoreKind::MemsetPattern:
case LegalStoreKind::Memcpy:
case LegalStoreKind::UnorderedAtomicMemcpy:
isSimpleUnrotated = true;
break;
default:
isSimpleUnrotated = false;
break;
}
if (!isSimpleUnrotated) break;
}
}
I'm not sure if i'm assigning false to IsSimpleUnrotated
at the right places. Also, once this check is done, what would the rotation of the loop look like? will it require extra headers like LoopRotation.h
?
Is anyone currently working on this?
Hi!
This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:
1) Assign the issue to you.
2) Fix the issue locally.
3) Run the test suite locally.
3.1) Remember that the subdirectories under test/
create fine-grained testing targets, so you can
e.g. use make check-clang-ast
to only run Clang's AST tests.
4) Create a git
commit
5) Run git clang-format HEAD~1
to format your changes.
6) Submit the patch to Phabricator.
6.1) Detailed instructions can be found here
For more instructions on how to submit a patch to LLVM, see our documentation.
If you have any further questions about this issue, don't hesitate to ask via a comment on this Github issue.
@llvm/issue-subscribers-good-first-issue
Working on unrotated loop is probably not the best idea. LoopIdiom may be improved to work on specific cases (like the memset example above) but other optimizations will be skipped. Will it be reasonable to add a flag that allows duplication of loop-header at Oz?
Verified,l:'5',n:'0',o:'LLVM+IR+source+%231',t:'0')),k:46.09880919613934,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:opttrunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:llvm,libs:!(),options:'-passes%3D!'default%3COz%3E!'+-enable-loop-header-duplication',overrides:!(),selection:(endColumn:58,endLineNumber:5,positionColumn:58,positionLineNumber:5,selectionStartColumn:58,selectionStartLineNumber:5,startColumn:58,startLineNumber:5),source:1),l:'5',n:'0',o:'+opt+(trunk)+(Editor+%231)',t:'0')),header:(),k:53.90119080386066,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4) that it is working.
I think we should only close the issue once the trivial memset gets removed by default when using -Oz
with Clang/other frontends.
Extended Description
Issue description: https://llvm.discourse.group/t/loops-not-optimized-under-oz-that-should-have-been/3734/1