keeleysam / tenfourfox

Automatically exported from code.google.com/p/tenfourfox
0 stars 0 forks source link

MacroAssembler: use load and store doubleword instructions on the G5 #203

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I extended the MacroAssembler to emit load and store doubleword instructions 
instead of two load/store word instructions where possible.
Remaining case is where two 32 bit immediates could be loaded as a 64 bit 
immediate, but I didn't implement such a function - and I currently can't think 
of a faster way than it is currently done (we can only load 16 bits as an 
immediate value, right?).
With the attached patch all previously passing tests pass here. Actually I 
expect that code paths to be used frequently, but I didn't verify this yet.

(in the state of 20 I've got checked out here at least, there are a couple of 
for..each and for..in loop tests that don't pass, but they also don't pass with 
JIT disabled).

Original issue reported on code.google.com by Tobias.N...@gmail.com on 17 Jan 2013 at 9:37

Attachments:

GoogleCodeExporter commented 9 years ago
(the failing tests do also fail in the G4 version so it has nothing to do with 
this patch)

Original comment by Tobias.N...@gmail.com on 17 Jan 2013 at 9:41

GoogleCodeExporter commented 9 years ago
Actually making it depend on _ARCH_PPC64 to be defined is a safer way of going 
sure 64 bit instructions are supported. The code generation is in no way 
dependent on passing -mpowerpc64 to gcc - just to answer potential questions.

Original comment by Tobias.N...@gmail.com on 17 Jan 2013 at 11:21

GoogleCodeExporter commented 9 years ago
This doesn't seem to save us many, if any, actual instructions. Does it bench 
faster?

I'd like to solve issue 200 before we move to this, assuming a benefit (if I 
don't finish issue 178 first).

Original comment by classi...@floodgap.com on 18 Jan 2013 at 3:17

GoogleCodeExporter commented 9 years ago
Well it does not save instruction but it does for sure save cycles.
On a 64 bit memory bus one doubleword store/load should not take any longer 
than one word store/load and that one instruction to insert the lower half of a 
register into the higher half of another one is for sure MUCH faster than a 
word load/store where it needs to access the memory bus or the cache. It's not 
much different from AltiVec in fact.
I'd say you don't really need to do any benchmarking in order to know it is 
faster using doubleword load/store.

But there's one thing I worry about: The load/store doubleword instructions 
actually enforce 8 byte alignment by ignoring the lowest 2 bits of the 
destination address. Now I don't really know how and where (because of 
patching) to verify whether the doubleword load/store can be used or if it 
actually has to resort to word load/store because of wrong alignment. 
Furthermore setting the last bit of the ld/std instruction to 1 would in fact 
make it an ldu/stdu and setting the second last bit would make it an invalid 
instruction.

Original comment by Tobias.N...@gmail.com on 18 Jan 2013 at 9:45

GoogleCodeExporter commented 9 years ago
Because one day I'd like to get the JavaScript JIT of WebKit working I don't 
mind JaegerMonkey being deprecated by IonMonkey - and in IonMonkey this might 
be useful as well.

Original comment by Tobias.N...@gmail.com on 18 Jan 2013 at 9:48

GoogleCodeExporter commented 9 years ago
(with current sources of Aurora 20 the following tests still fail:
    js1_6/extensions/regress-455464-01.js
    js1_6/extensions/regress-455464-02.js
    js1_6/extensions/regress-455464-03.js
    js1_6/extensions/regress-455464-04.js
    js1_6/extensions/regress-465443.js
    js1_6/extensions/regress-472508.js
    js1_6/extensions/regress-475144.js
    js1_6/extensions/regress-565521.js
    js1_6/Regress/regress-350417.js
    js1_6/Regress/regress-355002.js
    js1_6/Regress/regress-372565.js)

Original comment by Tobias.N...@gmail.com on 18 Jan 2013 at 12:04

GoogleCodeExporter commented 9 years ago
I'm looking at using 64-bit word instructions to implement Ionjit inc64, so 
this might be a logical point to implement this also.

Original comment by classi...@floodgap.com on 30 Apr 2013 at 2:14