llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.9k stars 11.51k forks source link

sign extension behavior is different from gcc #4151

Closed llvmbot closed 1 year ago

llvmbot commented 15 years ago
Bugzilla Link 3779
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @lattner

Extended Description

Compiling

short x;

void g(int); short h(void);

short f(void) { g(h()); return x; }

with gcc produces


  call    h
  movswl  %ax,%edi
  call    g
  movw    x(%rip), %ax

Note that only the caller does sign extension

with llvm we have

    call    h
    movswl  %ax, %edi
    call    g
    movswl  x, %eax

Note that both caller and callee do sign extension.

llvmbot commented 13 years ago

Is this still an issue?

Looks like it is. Given the first example a current clang produces:

    callq   h
    movswl  %ax, %edi
    callq   g
    movswl  x(%rip), %eax

and gcc 4.5 produces

call    h
movswl  %ax, %edi
call    g
movw    x(%rip), %ax
lattner commented 13 years ago

Is this still an issue?

llvmbot commented 15 years ago

Now that I think of it, could you please put the proposal on svn? That way we could just send you patches :-)

llvmbot commented 15 years ago

I noticed the proposal has been updated. Some comments

The example

int a(); short b() { return a(); }

Is there to show that truncation should be implemented as nop? It would probably be more interesting if "a" would return short and "b" an int. That would show a sign extension being done in the body of "b" on x86 and none on ppc.

In the example

short y(); int z() { return ((int)y() << 16) >> 16; }

I don't think the generated assembly is valid. For x86 we don't know the value of the high bits of eax when "y" returns. The generated assembly for z must set them. For example the code generated by gcc contains a cwtl. This is true for "return y();" and "return ((int)y() << 16) >> 16;".

Another thing you might want to change in the example is to use "short y(void)" instead of "short y()". The calling convention is different for some ABIs.

On the proposal, could you please show the llvm IL that should be generated for z if the architecture is ppc (with y returning i32) and x86 (with y return i16)?

It would be nice to do the same for b. On x86 it should return i16. On ppc i32.

Thanks!

lattner commented 15 years ago

Sure.

llvmbot commented 15 years ago

How about assume_zext(27) instead? Last time there was discussion of some kind of assert expression in LLVM, I noticed that several people thought it was supposed to do something (abort?) if the assertion did not hold, while "assume" hopefully won't trigger this mental reflex :)

lattner commented 15 years ago

Sure, since it is just an assertion as proposed, there is no reason it couldn't also work for arguments. I'd suggest something like: assert_zext(27) and assert_sext(2) etc.

llvmbot commented 15 years ago

This can be done by introducing new sext/zext attributes which mean "I know that the result of the function is sign extended at least N bits. Given this, and given that it is stuck on the y function, the mid-level optimizer could easily eliminate the extensions etc with existing functionality.

This is very interesting for Ada. In Ada you often know that the top bit of a function parameter is zero. This could be represented using i32 zext_from_i31. The top bit is just one example, you often know that a value is zero extended or sign extended from a smaller number of bits, but in general this is not from a nice round type like i8 or i16, but from any size, eg: i10. It's not clear what the best way of representing this is an attribute is, but it would be great if it could be done. The same goes for function return values.

llvmbot commented 15 years ago

my meta summary to your feedback is that I don't want to conflate return semantics with the CallingConv

This is an argument that I can buy. I don't buy the argument about the current design not being able to support various ABIs well (callee vs caller etc) since as I explained it is perfectly capable of doing so.

lattner commented 15 years ago

Duncan, my meta summary to your feedback is that I don't want to conflate return semantics with the CallingConv #. It is very straight-forward to make this explicit in the IR - allowing optimizers to reason about it and improve the code without target knowledge. Making this implicitly tied to the CC means that the mid-level optimizers would not have this information.

lattner commented 15 years ago

Since in X86 the value of the high bits of eax are not defined, the function "y" truly returns an i16.

Yes, ok, consider if the example had been written in PPC assembler :)

llvmbot commented 15 years ago

Finally, lets talk about objc versus C on x86. It sounds like objc wants i16 to be returned in i32 with extension in the callee. It seems that C (based on what gcc 4.4 does) wants i16 to be returned in i16. Clearly this means that objc and C have different calling conventions. So why not just introduce an objc calling convention?

I think the current model can be made to work by adding an objc calling convention since that is the only thing that is currently broken.

Having said that I prefer to make operations as explicit as possible in the llvm IL. This makes it easier to understand what is going on both for optimizers and humans.

As Chris mentioned, C compilers have to reason about the hardware to satisfy the ABI. What we have to do is find the most convenient way for them to pass that information to us so that LLVM doesn't need to repeat the reasoning. For example, having a objc calling convention looks like "repeating the reasoning" to me :-)

llvmbot commented 15 years ago

My thoughts: http://nondot.org/sabre/LLVMNotes/ExtendedIntegerResults.txt

On "The proposal", you say that


short y(); int z() { return ((int)y() << 16) >> 16; }

should be compiled to


define i32 @​z() nounwind { entry: %0 = tail call i32 (...)* @​y() nounwind %1 = trunc i32 %0 to i16 %2 = sext i16 %1 to i32 ret i32 %2 }

I think this should be


define i32 @​z() nounwind { entry: %0 = tail call i16 (...)* @​y() nounwind %1 = sext i16 %0 to i32 ret i32 %2 }

Since in X86 the value of the high bits of eax are not defined, the function "y" truly returns an i16.

llvmbot commented 15 years ago

My thoughts: http://nondot.org/sabre/LLVMNotes/ExtendedIntegerResults.txt

Thanks for writing a proposal! Taking a look at it. Some comments

The first examples is


int x(); short foo() { return x(); }

into:

_z: subl $12, %esp call _y movswl %ax, %eax addl $12, %esp ret

I assume that it should be


short y(); int z() { return y(); }

into:

_z: subl $12, %esp call _y movswl %ax, %eax addl $12, %esp ret

right? There is no function named foo or x in the assembly output, and if the caller is the one that needs to extend the return value then it is z that needs to return an int to force the extension.

On the second example,


short y(); int z() { return ((int)y() << 16) >> 16; }

_z: subl $12, %esp call _y ;; movswl %ax, %eax -> not needed because eax is already sext'd addl $12, %esp ret

I don't understand it. The code just makes the sign extension on z explicit in the C code. Maybe the intended example was


int x(); short y() { return x(); }

int z() { return y() };

_y: subl $12, %esp call _x addl $12, %esp ret

_z: subl $12, %esp call _y ;; movswl %ax, %eax -> not needed because eax is already sext'd addl $12, %esp ret

In this case, the most efficient assembly for y already has eax set, so z doesn't need to extend it.

llvmbot commented 15 years ago
  1. The frontend should make all the sign / zero extensions explicit. This allows us to implement all kinds of ABI conventions (e.g. callee sign extend, caller does not, or vice versa). This eliminate the ambiguity and allows optimizer to do a better job of optimizing stuff.

I LOVE this part :-)

  1. signext and zeroext should be replaced by other attributes that record how the return value is promoted. e.g.

char f() {...} => i32 f() sext_from_i8 { ... ret i32 ... }

The new attributes become optimization hints. Codegen is free to ignore them.

I am not sure how this would be used, but if have an use I am fine with it too.

llvmbot commented 15 years ago

Hi Chris, thanks for the write up. I think your notes miss an important point however: the role of the calling convention (CC). I made a mistake in my description of the current situation: I talked about legal and illegal types, however it is actually the CC that matters. So let me first describe my understanding of how the current system works. I will only talk about return values here.

Suppose a function returns an integer value, eg: i16. The calling convention determines the type it should actually be returned in. It can say that it should be returned in an i32 by doing for example CCIfType<[i8, i16], CCPromoteToType> [A target like PPC gets i32 for i16 without doing anything explicit, because i16 is illegal so is automagically promoted to i32; but in theory the CC could specify i16 returned in i64 if it wanted to]. Our current x86 C calling convention says to return i16 in i16, but we could return it in i32 if we wanted it to, by modifying the CC.

Now that the CC has determined the return type what happens?

(1) if the return type is the same as the original (i16 returned in i16) then nothing happens: there is no extension to a larger type.

(2) if the return type is bigger than the original type (i16 returned in i32) then the return value is sign/zero extended (depending on the signextend/zeroextend attribute) to the larger type in the callee before being returned.

Hopefully this deals with "1) the actual precise semantics are really poorly defined (see llvm/llvm-bugzilla-archive#3779 )", in the sense that the above description seems perfectly precise to me. I admit that it is possible that my description of the current situation does match what the code currently actually does, but if so that's just a bug in the implementation IMHO! In short, I think your point (1) can be addressed by improving documentation and perhaps tweaking the current code. OK, now that Evan has reverted the patch in this PR there's some ugly hack about i32 deep in the code generators; I'm talking about what happens without that hack (which should be removed again once the situation has settled).

Let me now show how the current situation is perfectly capable of handling all cases described in your "2) some targets might want the caller to extend, some might want the callee to extend". In my examples I suppose the callee was defined as returning a short (i16), and that any extensions are to i32.

Case: "the callee should extend the value before returning". Then the function is declared as returning i16 in the IR. The calling convention declares that i16 is returned in i32. The signextend/zeroextend attributes specify the nature of the extension. The result is that the callee does the appropriate extension before returning the value in an i32.

Case: "the caller should extend the returned value". Subcase: i16 is legal for the target. Then the function is declared as returning i16 in the IR. Thus the caller's extension to i32 is explicit in the IR. The calling convention declares that i16 is returned in i16. The result is that the callee doesn't do any extending, the result is return in an i16 and the callee extends it in the normal way. Any signextend/zeroextend attributes on the callee are ignored. Subcase: i16 is not legal for the target but i32 is. Then the function is declared as returning i16 in the IR. Thus the caller's extension to i32 is explicit in the IR. The callee should not have any signextend/zeroextend attributes. The calling convention declares that i16 is returned in i32. The result is that the callee does an "any extend" of the i16 to an i32 before returning it, i.e. it returns an i32 with some rubbish in the upper 16 bits. The caller will then automatically zero/sign extend the i32 from i16 in the usual way.

Summary: (1) in the IR the callee is always declared as returning i16. (2) if the callee extends, then the callee gets the signextend/ zeroextend return attribute. If the caller extends then the callee should not have these attributes. (3) if the value should be returned in i32 then the calling convention should say this. Simple, right? :)

Hopefully this shows that "2) some targets might want the caller to extend, some might want the callee to extend" is a non-problem.

Now consider "3) the mid-level optimizer doesn't know the size of the GPR, so it doesn't know that %0 is sign extended up to 32-bits here, and even if it did, it could not eliminate the sext." This is absolutely correct, but does it matter? The code generators know perfectly what is extended where, and will eliminate pointless extensions. I'm not sure why you want the middle-end to do this. In fact it seems like a layering violation to me.

Now let's talk about "4) the code generator has historically assumed that the result is extended to i32, which is a problem on PIC16 (and is also probably wrong on alpha and other 64-bit targets)." This was always bogus in my opinion, and the patch in this PR removing this hack was a good move in my opinion. It never made any sense anyway: if i16 should be returned in i32 then the calling convention should just say so! I have no idea why i32 was being shoved into the innards of the code generator. Anyway, point 4) has really nothing to do with the current scheme, it is about a hack that got shoved into the system at some point (and was the wrong "solution" to the problem it was trying to solve IMHO).

Finally, lets talk about objc versus C on x86. It sounds like objc wants i16 to be returned in i32 with extension in the callee. It seems that C (based on what gcc 4.4 does) wants i16 to be returned in i16. Clearly this means that objc and C have different calling conventions. So why not just introduce an objc calling convention?

lattner commented 15 years ago

My thoughts: http://nondot.org/sabre/LLVMNotes/ExtendedIntegerResults.txt

llvmbot commented 15 years ago

It's easy to come up with a case the original patch breaks stuff. The objective-c runtime system expects a function that returns i8 to have sign extended the value. If not, bad things happen. The current system is simply not encoding enough information to make the optimization possible.

Chris and I talked about this. He is sending out a proposal. Basically there are two sets of changes required to implement this optimization.

  1. The frontend should make all the sign / zero extensions explicit. This allows us to implement all kinds of ABI conventions (e.g. callee sign extend, caller does not, or vice versa). This eliminate the ambiguity and allows optimizer to do a better job of optimizing stuff.
  2. signext and zeroext should be replaced by other attributes that record how the return value is promoted. e.g.

char f() {...} => i32 f() sext_from_i8 { ... ret i32 ... }

The new attributes become optimization hints. Codegen is free to ignore them.

lattner commented 15 years ago

FWIW, I'm going to write up a short proposal on how to tackle this.

llvmbot commented 15 years ago

Turns out this breaks some objective-c code. The objective-c run time expects the signext and zeroext attributes to be honored.

The llvm documentation also states these attributes should be honored: "This indicates to the code generator that the parameter or return value should be sign-extended to a 32-bit value by the caller (for a parameter) or the callee (for a return value)."

Return values are currently extended by the callee, but perhaps not to an i32.

Just so everyone is clear about this, here is how these attributes work right now (I'm only talking about return values here):

(1) if the return type is a legal type then these attributes are ignored. Fair enough, since there is no extension to a larger type.

(2) if the return type is illegal, and so values are passed in a larger type, then the return value is sign/zero extended (depending on which attribute is set) to the larger type in the callee before being returned.

Examples: PPC: on ppc32 i16 is illegal but i32 is not. A i16 return value is extended in accordance with any sign/zero extend attribute to an i32 in the callee before being returned.

X86: on x86 i16 is legal so an i16 return value is returned as is without any extension.

Anyway, that's how things currently are. Before this change callees would always extend up to i32 whether the type was illegal or not before returning. However callers didn't know about this, and if the return type was a legal type they wouldn't understand that the return value had already been extended to i32 by the callee, and so would sometimes also do an extension. Note that this hack essentially existed entirely for the benefit of x86 and C-like languages.

Now there is the question of how things should be. I think there should be no hacks about i32 in the code generator and no "C" specific logic. In fact I think that the way the code generator currently handles things is just dandy, and the obj-c front-end should be fixed. Of course LangRef should be corrected, since doesn't reflect the state of play.

My simple tests show that everything works correctly for C. If everyone agrees that C and C++ are working correctly (Ada is currently fine too thanks!) then I think the right approach is for Evan or someone to produce an obj-c testcase, and have people who know about obj-c fix the obj-c front-end.

llvmbot commented 15 years ago

One additional bit of information. I plan to revert the patch. But I also intend to make one change. SDISel should not force the promotion if the callee is not marked signext / zeroext.

Could you add a testcase to show that there is a sign extension if one of the attributes is set but no sign extension if it is not?

I will try to change llvm-gcc then.

llvmbot commented 15 years ago

One additional bit of information. I plan to revert the patch. But I also intend to make one change. SDISel should not force the promotion if the callee is not marked signext / zeroext.

llvmbot commented 15 years ago

Turns out this breaks some objective-c code. The objective-c run time expects the signext and zeroext attributes to be honored.

The llvm documentation also states these attributes should be honored: "This indicates to the code generator that the parameter or return value should be sign-extended to a 32-bit value by the caller (for a parameter) or the callee (for a return value)."

I'm going to shoot first and ask question later. :-) I'll reopen this and revert the patch. It seems to me we need a different approach.

  1. If the ABI does not require extension then the frontend should not emit the signext / zeroext attribute.
  2. If all the callers are known, then the callers should promote the return values instead.
llvmbot commented 15 years ago

Fixed on rev 67132.

llvmbot commented 15 years ago

I am concerned this might break ABI compatibility with gcc4.2?

It will not. Gcc 4.2 does sign extension of return values both in the callee and in the caller. It can safely call a function that doesn't extend its return value.

llvmbot commented 15 years ago

I am concerned this might break ABI compatibility with gcc4.2?

llvmbot commented 15 years ago

There is still some regressions for cellSPU that I don't understand. For some tests like trunc_i32_i8, there is now an extra sequence:

    ilhu    $4, 771
    iohl    $4, 771
    shufb   $3, $3, $3, $4

which if I understand correctly, just sets every byte of $3 to be the same as the third byte. I think I need someone with CellSPU experience to take a look at this :-)

It enforces the register uniformity that's expected. It's a feature.

Interesting. What I still don't understand is why this is only showing up after the patch.

This shuffle-for-truncation has been there for quite a while now. It's been that way that since revision 61447. Why? Because scalar registers don't exist on the Cell's SPU. They exist solely as a convention that the 0-th slot is used for scalars. Consequently, the CellSPU backend bends over backwards to maintain this scalar-vector register uniformity (or interchangability).

I suspect that you're now seeing it because a i32-to-i8 trunc is no longer being eliminated by DAGCombiner.

It's not a regression. It's a feature. Really.

llvmbot commented 15 years ago

There is still some regressions for cellSPU that I don't understand. For some tests like trunc_i32_i8, there is now an extra sequence:

    ilhu    $4, 771
    iohl    $4, 771
    shufb   $3, $3, $3, $4

which if I understand correctly, just sets every byte of $3 to be the same as the third byte. I think I need someone with CellSPU experience to take a look at this :-)

It enforces the register uniformity that's expected. It's a feature.

Interesting. What I still don't understand is why this is only showing up after the patch.

llvmbot commented 15 years ago

There is still some regressions for cellSPU that I don't understand. For some tests like trunc_i32_i8, there is now an extra sequence:

    ilhu    $4, 771
    iohl    $4, 771
    shufb   $3, $3, $3, $4

which if I understand correctly, just sets every byte of $3 to be the same as the third byte. I think I need someone with CellSPU experience to take a look at this :-)

It enforces the register uniformity that's expected. It's a feature.

llvmbot commented 15 years ago

updated patch Looking at the output of gcc, it seems that it is not necessary to sign extend return values on the cell. The attached patch also fixes the cell testcases that check for that.

There is still some regressions for cellSPU that I don't understand. For some tests like trunc_i32_i8, there is now an extra sequence:

ilhu    $4, 771
iohl    $4, 771
shufb   $3, $3, $3, $4

which if I understand correctly, just sets every byte of $3 to be the same as the third byte. I think I need someone with CellSPU experience to take a look at this :-)

lattner commented 15 years ago

The rest of the patch looks fine to me also, but please be on the lookout for regression after it goes in.

lattner commented 15 years ago

The lib/Target/X86/X86InstrInfo.td patch looks independent of this work, plz commit it separately (and immediately if you're happy with it)

llvmbot commented 15 years ago

updated patch This is an update on Duncan's patch to fix the TLS case and to update the X86 tests.

There are some regressions for CellSPU, but I have no idea what its ABI looks like.

llvmbot commented 15 years ago

If you are happy (I am happy!) then can you please apply the patch along with a testcase.

llvmbot commented 15 years ago

With your patch there is sign extension only on the caller on x86 and x86-64 and only on the callee on arm and ppc32.

llvmbot commented 15 years ago

Hi Rafael, it looks like my patch fixes x86. I expect llvm-gcc will produce the right code for ppc and arm too. Can you please check this (tricky for me to test this right now).

llvmbot commented 15 years ago

looks like bugzilla lost my last two emails. Combining them:

First, thanks for the clarification on signext, I had a misunderstanding about it.

For ppc, gcc produces

f: mflr 0 stwu 1,-16(1) stw 0,20(1) bl h bl g lwz 0,20(1) lis 9,x@ha addi 1,1,16 mtlr 0 lha 3,x@l(9) blr

so ppc is the opposite of x86: The callee extends, the caller doesn't.

We can do this is llvm without the signext by having llvm-gcc/clang produce


@​x = common global i16 0

define i32 @​f() nounwind { entry: %0 = tail call i32 @​h() nounwind tail call void @​g(i32 %0) nounwind %1 = load i16* @​x, align 2 %2 = sext i16 %1 to i32 ret i32 %2 }

declare i32 @​h()

declare void @​g(i32)

If targeting ppc. This is compiled by llc into


f: mflr 0 stw 0, 4(1) stwu 1, -16(1) bl h bl g lis 3, x@ha lha 3, x@l(3) addi 1, 1, 16 lwz 0, 4(1) mtlr 0 blr

which I think is correct, but I don't have a lot of experience with ppc.

So I think we agree that the first think we should do is remove the x86 codegen hack (I will test your patch). This should be enough to fix this bug.

Then there is the question of "should we remove signext?". I would vote for it. It would make more architecture details explicit earlier (like f returning i32). The downside is that the generated .ll is now more architecture dependent (it was never independent).

llvmbot commented 15 years ago

Hi Rafael, thanks for the link.

llvm-gcc should probably mark functions f and h as returning i16 and nothing more. How exactly they return the value is ABI dependent. In X86, they do so by writing the value to the lower 16 bits of eax an the higher 16 bits are undefined.

I'm not sure about this (dropping signext). Just to be clear, here's what signext means: signext does not instruct codegen to return the value in a larger type. It tells codegen: if you (codegen) decide to return the value in a larger type then please sign extend it.

The codegenerator contains a nasty hack that modifies the return types of functions returning integers, increasing the size of the integer to at least i32. Suppose that hack was deleted but signext was left in, what would happen then?

On x86, the codegenerator would see a return type of i16 (rather than the hacked i32 it sees now). Since i16 is a legal type this would result in the value being returned in a "short" register without any sign extension.

On ppc, the codegenerator would see a return type of i16. Since i16 is not a legal type this would result in the value being returned in an i32. Due to the signext attribute, the i16 value would be sign extended to i32 before being returned. Does gcc sign extend shorts like this on ppc?

If a caller needs a i32 value (as for example to call function g), llvm-gcc should sign or zero extend it based on the C signature of the variable that is holding the value.

This is already the case. Notice how the values are being explicitly extended to i32 in the IR ? The signext has no functional effect at the codegen level, it is entirely for the benefit of the codegenerators.

... And the only difference in the generated assembly is that we have only one extension and it is explicitly represented in the code. The return value is not extended.

I think it is important to see what gcc does on ppc or some other platform where i16 is not a legal type (i.e. no i16 registers). That will make it clearer as to whether the signext return attributes should be dropped or whether it's enough to drop the nasty codegen "up the size to i32" hack.

llvmbot commented 15 years ago

The correct link for gcc-patches is http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00424.html

Getting a bit into how I think this should be done (as opposed to what it should do):

llvm-gcc should probably mark functions f and h as returning i16 and nothing more. How exactly they return the value is ABI dependent. In X86, they do so by writing the value to the lower 16 bits of eax an the higher 16 bits are undefined.

If a caller needs a i32 value (as for example to call function g), llvm-gcc should sign or zero extend it based on the C signature of the variable that is holding the value.

So the generated code from llvm-gcc is very similar (just edited it myself, not tested)

define i16 @​f() nounwind { entry: %0 = tail call i16 @​h() nounwind %1 = sext i16 %0 to i32 tail call void @​g(i32 %1) nounwind %2 = load i16* @​x, align 2 ret i16 %2 }

And the only difference in the generated assembly is that we have only one extension and it is explicitly represented in the code. The return value is not extended.

llvmbot commented 15 years ago

The gcc mailing list link seems to be wrong.

Also, currently llvm-gcc produces:

define signext i16 @​f() nounwind { entry: %0 = tail call signext i16 @​h() nounwind ; [#uses=1] %1 = sext i16 %0 to i32 ; [#uses=1] tail call void @​g(i32 %1) nounwind %2 = load i16* @​x, align 2 ; [#uses=1] ret i16 %2 }

If I understand correctly, signext on a return value means that if the result is returned in a bigger register (eg: i32 rather than i16) then the value was sign extended by the callee.

Based on this, at codegen time it should be possible to drop the %1 = sext i16 %0 to i32.

However what happens on x86 is that the call becomes 0x9412f00: i16,ch = call 0x9412d84, 0x9412e7c while the returned value is 0x94133a4: i32 = sign_extend 0x9413320 i.e. callers think that i16 gets returned as an i16, while callees think that i16 is returned as an i32!

So it seems like codegen is a bit confused.

llvmbot commented 15 years ago

This was changed in gcc in http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00300.html

see http://groups.google.com/group/ia32-abi/browse_thread/thread/f47e0106b21d9269

llvmbot commented 15 years ago

Why is it wrong? I couldn't find anything on the ABI, but if the caller does sign extension, there is no need for the callee to do it too.

llvmbot commented 15 years ago

Really? gcc is wrong here. Apple's gcc doesn't do this.

arsenm commented 1 year ago

Current clang:

    .cfi_startproc
# %bb.0:
    pushq   %rax
    .cfi_def_cfa_offset 16
    callq   h@PLT
    movswl  %ax, %edi
    callq   g@PLT
    movzwl  x(%rip), %eax
    popq    %rcx
    .cfi_def_cfa_offset 8
    retq

Current gcc:


    .cfi_startproc
    endbr64
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    call    h@PLT
    movswl  %ax, %edi
    call    g@PLT
    movzwl  x(%rip), %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc

Looks the same to me