riscvarchive / riscv-code-size-reduction

https://jira.riscv.org/browse/RVG-122
150 stars 34 forks source link

Optimization passes used by ARM gcc and HCC #1

Closed dodohack closed 3 years ago

dodohack commented 4 years ago

In the paper "HW/SW approaches for RISC-V code size reduction", it explained the version of ARM GCC is 7.2, HCC is based on GCC 7.3.

As far as I know, ARM support is more mature than RISC-V support in GCC 7.x, there is lack of optimization passes for RISC-V target in GCC, unless HCC has implemented it's own RISC-V target dependent optimization passes.

Do you have data about what optimization passes are used for both ARM GCC and HCC under '-Os' option in the original test(without adding customized instruction to reduce the code size)?

Thanks.

tariqkurd-repo commented 4 years ago

Hi, I don't know the answer. I know some optimisation work was done in HCC but I don't know the details. It's certainly true that any advantage gained by custom instructions is likely to reduced by improved compiler support, so the goal of this group is to optimise both the toolchain support and the ISA for code size.

David-Horner commented 4 years ago

I don't know why this is closed. Others could contribute answers to:

Do you have data about what optimization passes are used for both ARM GCC and HCC under '-Os' option in the original test(without adding customized instruction to reduce the code size)?

I don't believe the answers so far are complete.

cetola commented 4 years ago

Reopening to allow further discussion.

tariqkurd-repo commented 3 years ago

Hi @dodohack - what do you want to do with this issue?

dodohack commented 3 years ago

If you can provide further information about the optimization passes used in both ARM gcc and RISC-V gcc, that's would be great.

tariqkurd-repo commented 3 years ago

@dodohack I don't have that information, You can email the RISC-V mailing list to ask about the RISC-V GCC port, but I don't know if anyone will be able to help with the ARM GCC question.

David-Horner commented 3 years ago

I was surprised to learn in the All Hands Meeting that TGs are not to concern themselves about tool-chain matters.

Perhaps I misunderstood.

Is this the primary reason issue #1 was closed?

I find this directive puzzling and perplexing.

Many of the directional decisions by tech-code-size were motivated by examining the deficiencies/characteristics of generated code.

This approach detected "low hanging fruit" that could be harvested by specific hardware extensions to the Instruction Architecture.

Notably, PUSH/POP and TBLJMP were conceived in the  conterxt of current gcc and LLVM behaviour.

However, advances/maturity in/of compiler code generation could make these instructions obsolete.

How is it not pertinent that we consider software tool-chain and compiler/Load-time-optimizations?

I was listening to the All Hands Meeting at 3:30AM, so perhaps I, being half-awake,  vastly misconstrued the directive.

Please tell me this was a dream; only but a nightmare!!

On 2021-02-26 8:47 a.m., Tariq Kurd wrote:

Closed #1 https://github.com/riscv/riscv-code-size-reduction/issues/1.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-code-size-reduction/issues/1#event-4381933442, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWIKJY6THYFNBJLSAATUDTA6Q7FANCNFSM4RAENJLQ.

jim-wilson commented 3 years ago

I think there is a lot of misunderstanding here.

No one is ignoring the toolchains, or telling people not to look at them.

I think this bug report was a question about HCC and gcc optimiation passes. There was an attempt to answer the question, and the issue was closed.

I can't answer questions about HCC, but I can talk a bit about FSF GCC.

GCC optimizations passes are target independent. The same opt passes run for ARM and RISC-V. In general, target dependent optimization passes are a serious mistake except is very specific circumstances. That isn't the right way to improve RISC-V code size or performance. Every target dependent pass is a liability, because it will never be tested or maitained by anyone other that RISC-V users and compiler developers. You should instead improve existing optimization passes if they aren't working optimally for RISC-V.

GCC does have a machine dependent reorg hook to do some target dependent optimizations very late before generating assembly code. This is usually used for a few target specific features. The ARM hook for instance creates the constant pools that are placed in the code. There is thumb2 support to figure out where the condition code flags are used, and convert some non-flag setting instructions to flag setting instructions when that reduces encoding size. Neither of things is relevant to RISC-V. The Aarch64 port does not use this hook. The RISC-V hook tries to identify some -msave-restore calls that are unnecessary and remove them.

GCC also has a way to add machine dependent optimization passes. The ARM port does not use this feature. The Aaarch64 port does, but for some very specific features. There is a pass to rename registers used in fma instructions because the pipeline works best when accumulator and dest regs have the same parity. There is a pass to insert instructions to defend against speculation attacks. There is a qualcom Falkor specific pass to try to avoid cache tag collisions. This part is dead so not interesting anymore. There is a pass to insert the branch target identification instructions to defend against return oriented programming style attacks. There is a pass to try to optimize SVE sequences using condition code flags. These are all somewhat obscure things and half of them are security related not performance or code size related. We have one pass for RISC-V, which loots for a sequence of non-compressed load/store instructions using the same base address, and tries to insert an add instruction to adjust the base address so that we can use compressed instructions. This reduces performance but reduces codes size, so is useful in some embedded applications. This pass rarely triggers for application code, but some kinds of embedded code has sequences like this that can trigger the pass.

There are things that we can do to improve the compiler, but most of the compiler ideas are either limited impact issues that will only give 0.01% performance or size improvement, and the interesting ones are the types of projects that will take a significant amount of time of effort, possibly months of work, with no oguarantee of a successful result.

The main problem affecting the compilers is a lack of interesting instructions. Every time the compiler has to emit 2 or 3 instructions do so something that ARM can do with 1 instruction, we lose on code size and performance. The B extension helps fill some of the obvious holes, but we need more. I like the stuff that Huawei did on their part. It is basically the same stuff I've been suggesting inside SiFive for years.

As for this bug report, I don't believe a big open ended issue like this is useful. I don't see anything useful coming out of this. We need to identify specific problems, and then work on them. Not worry about general questions that don't have an answer.

David-Horner commented 3 years ago

On Sat, Feb 27, 2021 at 7:19 AM ds2horner@gmail.com wrote:

I was surprised to learn in the All Hands Meeting that TGs are not to concern themselves about tool-chain matters.

Perhaps I misunderstood.

To be clear: I, @David-Horner, did not author the original of this post. It would appear to be @jim-wilson However, I have my doubts that this was indeed posted by Jim. The tone of the previous post is helpful and describes the information that was anticipated by the #1 issue. The tone of this post is substantially less understanding and I would characterize as snarky.

Yes, you misunderstood.
ouch. No one is saying that we should ignore the compilers. They are saying that you don't need to worry about the compiler at every step of designing every extension.
Double ouch. I appreciate the clarification, but the All Hands meeting did not say this instead: " It does not include upstream Projects and eco system software ... such does not constitute TG [unclear expression] ... [does not include] to say 'we need these gcc changes' ..." I think my question is on point when this is what was said. Consider the crypto extension. It is driven by crypto theory and algorithms. Compiler optimization is irrelevant, and the compiler implementation will be trivial, some intrinsic functions. That is nice for crypto. But this may be the exceptional case. I expect Jim [maybe not this "contributor"] knows better than most that deployment is essential to feature viability, especially when such features are directly related to tool-chain optimizations: compiler augmenting instructions and LTO facilities. This is the exact case for the code-size reduction TG. So, I believe my question is still on point.

Jim I cannot validate this signature!!

David-Horner commented 3 years ago

On 2021-02-27 1:04 p.m., David-Horner wrote:

On Sat, Feb 27, 2021 at 7:19 AM ds2horner@gmail.com wrote:

I was surprised to learn in the All Hands Meeting that TGs are not to concern themselves about tool-chain matters.

Perhaps I misunderstood.

To be clear: I, @David-Horner,  did not author the original of this post. It would appear to be @jim-wilson However, I have my doubts that this was indeed posted by Jim. The tone of the previous post at https://github.com/riscv/riscv-code-size-reduction/issues/1#issuecomment-787103934

is helpful and describes the information that was anticipated by the #1 issue.

The tone of this post is substantially less understanding and I would characterize as snarky.

Yes, you misunderstood. ouch. No one is saying that we should ignore the compilers.  They are saying that you don't need to worry about the compiler at every step of designing every extension.

Double ouch. I appreciate the clarification, but the All Hands meeting did not say this, instead: "Next are the task groups. These have real work product. Work product is extensions, ABIs, significant requirements documents.

It does not include upstream Projects and eco system software ... such does not constitute TG [unclear expression] ... [does not include] to say 'we need these gcc changes' ..."

I think my question is on point when this is what was said.

Consider the crypto extension.  It is driven by crypto theory and algorithms. Compiler optimization is irrelevant, and the compiler implementation will be trivial, some intrinsic functions.

That is nice for crypto. But this may be the exceptional case. I expect Jim [maybe not this "contributor"] knows better than most     that deployment is essential to feature viability,    especially when such features are directly related to tool-chain optimizations:    compiler augmenting instructions and LTO facilities. This is the exact case for the code-size reduction TG. So, I believe my question is still on point.

Jim

I cannot validate this signature!!

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-code-size-reduction/issues/1#issuecomment-787111966, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWIKLIYXBHBX2KWQJBR5DTBEX37ANCNFSM4RAENJLQ.

David-Horner commented 3 years ago

On 2021-02-27 1:04 p.m., Jim Wilson wrote:

On Sat, Feb 27, 2021 at 7:19 AM <ds2horner@gmail.com mailto:ds2horner@gmail.com> wrote:

I was surprised to learn in the All Hands Meeting that TGs are not
to concern themselves about tool-chain matters.

Perhaps I misunderstood.

Yes, you misunderstood.  No one is saying that we should ignore the compilers.  They are saying that you don't need to worry about the compiler at every step of designing every extension.  Consider the crypto extension.  It is driven by crypto theory and algorithms.  Compiler optimization is irrelevant, and the compiler implementation will be trivial, some intrinsic functions.

Jim

oopsy. I didn't see this until now.

Somehow this was cross posted to riscv/riscv-code-size-reduction under my name.

Bizarre.

As I answered it already on that post I won't say anything more here.

Other than to thank you again for the meaningful post with gcc information related to ARM and RISCV optimizations.

lazyparser commented 3 years ago

Hi @jim-wilson

Thanks a lot for the long explaination. It very helpful to me :-)

David-Horner commented 3 years ago

@jim-wilson

As for this bug report, I don't believe a big open ended issue like this is useful. I don't see anything useful coming out of this. We need to identify specific problems, and then work on them. Not worry about general questions that don't have an answer.

And yet we have @lazyparser and @maskray luv'n it.

I think a comparable contribution from an LLVM expert would be as welcomed and appreciated.

The request wasn't a bug report per se, but an issue @dodohack raised that I believe is essential to this particular TG. paraphrasing:

dodohack commented 3 years ago

Thanks for the thoughtful discussion from @jim-wilson @David-Horner .

I'm very appreciate what they have done in their research.

The main issue I want to raise is what @David-Horner has already mentioned:

What have we learned from other experiments in incorporating software optimizations [the risc/MIPS/ARM/VLIW/etc. predecessors] rather than adding Instruction Architecture hardware features?

My thought is we should look into software optimizations as much as possible before extending the ISA with customized extensions, as the software optimizations improves, the improvements from some customized extensions may not as good as what might expect.

And some customized ISA extension may come with a cost of hardware modifications, which may overcome the benefit it may bring to the architecture, for example, there is a HCE(Huawei Custom Extension) "L.LI" which is a 48 bit instruction, in order to support 48 bit instruction, the hardware needs to be extended to have 48 bit instruction bus with some other modifications(I'm not a hardware guy, I don't know how much hardware modifications exactly needed). And I don't think it is an elegant design to have 48 insns in 32 bit RISC architecture. The loading of global address can be always acomplished with "LUI" and "ADDI" which is 2 bytes larger than "L.LI". If there are multiple global addresses are needed to be loaded, the compiler can always optimize the subsequential loads into a load with base pointer + offset which the base pointer can be the previous loaded global address. How frequently the "L.LI" is used in software optimized target code is also need be carefully checked.

So the main purposes I have raised this issue are:

  1. Extend the ISA only if we can't find a software solution.
  2. To avoid ISA fragmentization as much as possible(I know it is not absolutely possible), there is already a proposing P extension from Andes which overlaps with Xpulp extension.
  3. What is the hardware cost for the gain of code size reduction.
jim-wilson commented 3 years ago

nanomips added a 48-bit instruction just like the Huawei L.LI for the exact same reason Huawei did. It reduces code size, and is easy for the compiler to use. The RISC-V ISA already has support for 48-bit, 64-bit, etc instruction encodings.

David-Horner commented 3 years ago

On 2021-02-28 1:59 p.m., Jim Wilson wrote:

nanomips added a 48-bit instruction just like the Huawei L.LI for the exact same reason Huawei did. It reduces code size, and is easy for the compiler to use.

I have to agree with you in ease of integration and reduced code size.

However, Aries observation is still valid:    that massive numbers of L.LI [where the value of shorted encoding is significant]    can be replaced by short code sequences [potentially averaging below 32bits per value] by    relying on intermediate values stored in registers.    Avoiding the massive numbers of 32bit loads themselves by using different algorithms is also fertile ground.

Some of these alternatives may be mega-projects, however some may be readily obtainable for most use cases, in which case every one in the eco-system benefits, not just ILEN=48 pioneers.

The RISC-V ISA already has support for 48-bit, 64-bit, etc instruction encodings.

Why I mention ILEN=48 pioneers is that the support of which you allude eludes me.

Is it not currently vapourware?

RVV decided NOT to implement ILEN=64 bit instructions because the infrastructure was not in place and there was not ratification of the ILEN>32 encodings.

The path to ratifiable RISVC.org extensions in 48bits is unclear. I argue that it is of high importance to provide a path way to the longer encodings. But I believe the approach needs to be more comprehensive than just settling on ILEN>32 encoding   but rather leverage the unique aspects of the RVI architecture.

Specifically, I suggest we formally support an extensive set of fusible instructions that can be mapped to the target machines custom 32bit instructions, if they provide them. L.LI would replace the LUI/ADDI fusible code sequence.

Only the LTO is initially affected, however, as with RVC encoding, the front end can be made increasingly aware and favour those instruction sequences that the target machine supports. The substitution can still be done [and should be in my opinion] as an LTO feature, so that any machine can execute the code, but those that have custom instructions can still be optimized in the front end.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-code-size-reduction/issues/1#issuecomment-787502858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWIKPW3JWJWI3WBPLBL2LTBKHALANCNFSM4RAENJLQ.

David-Horner commented 3 years ago

+1

we believe all ecosystems pieces to help members be successful are important.

they just may have different types of groups driving them. A TG is required for an extension or ABI like spec but not for all upstream work. that can be done by any of the groups.


sent from a mobile device. please forgive any typos.

On Feb 27, 2021, at 10:04 AM, Jim Wilson jimw@sifive.com wrote:



On Sat, Feb 27, 2021 at 7:19 AM ds2horner@gmail.com wrote: I was surprised to learn in the All Hands Meeting that TGs are not to concern themselves about tool-chain matters.

Perhaps I misunderstood.

Yes, you misunderstood. No one is saying that we should ignore the compilers. They are saying that you don't need to worry about the compiler at every step of designing every extension. Consider the crypto extension. It is driven by crypto theory and algorithms. Compiler optimization is irrelevant, and the compiler implementation will be trivial, some intrinsic functions.

Jim .,.,_ Links: You receive all messages sent to this group.

View/Reply Online (#773) | Reply To Group | Reply To Sender | Mute This Topic | New Topic Your Subscription | Contact Group Owner | Unsubscribe [markhimelstein@riscv.org]

.,.,_

lazyparser commented 3 years ago

+1 we believe all ecosystems pieces to help members be successful are important. they just may have different types of groups driving them. A TG is required for an extension or ABI like spec but not for all upstream work. that can be done by any of the groups.

The messages Jim and Mark sent were pasted on github using wrong name. Turns out sending mails across github and RVI lists is not a good idea.

Mark's latest comment: https://lists.riscv.org/g/tech-code-size/message/783