Closed GoogleCodeExporter closed 9 years ago
Hi,
What compiler is this? Ideally the compiler should generate good code for small
constant memcpy, but I'm fully aware this is not always happening.
On first thought, it seems like you're mashing things a bit together; the
struct trick is nice, but you're not using it for big-endian architectures at
all? This is confusing; I don't immediately see why byte-for-byte loads should
be faster on _all_ big-endian architectures (vs. unaligned load + swap).
Note that while ARM performance is interesting to us, _simulated_ ARM
performance (QEMU) is not.
Original comment by sgunder...@bigfoot.com
on 20 Apr 2012 at 9:36
Original comment by se...@google.com
on 20 Apr 2012 at 9:42
Hi,
For Sparc it is gcc-4.2.1 and the code generated for UNALIGNED_LOAD32 with
memcpy is 4 bytes loads, 4 byte stores (this is the memcpy) followed by an
uint32 load. So the memcpy is optimized all right, but gcc doesn't understand
where the bytes are going. On ARMv5 and gcc-4.6.1, the call to memcpy is not
optimized at all.
The struct trick is used for the loads/stores in native endian order and
byte-for-byte is used when little endian order is needed. Again gcc has trouble
optimizing the two steps load + swap.
Byte-for-byte loads are not faster on_all_ big endian archs, for example not on
for powerpc, which is excluded as before. But powerpc is unusual in that it can
access unaligned little endian data with one instruction.
Note: If you have real pre-arm7 hardware, I think you will see a speedup as I
did in qemu, but not exactly the same amount of speedup.
Original comment by skrab...@gmail.com
on 22 Apr 2012 at 7:02
I don't think pre-arm7 is all that important for performance; perhaps we can
leave that one out? As I understand it, the struct trick is primarily for
SPARC, where you've already shown a real benefit. The other changes can go in
if we find a real, non-emulated platform where people need the performance.
Here's what I think needs to happen:
1. Make a patch with only the struct trick, conditional on GCC + little-endian.
2. Take a look at the Google C++ style guide
(http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml); in
particular, it (and thus Snappy) doesn't use spaces inside (), and prefers
static_cast<int>(x) to int(x). (Strange as I find the latter :-) ). I can clean
this up if you don't want to, but it's of course easier if the patch already
conforms to the style guide.
2. Please sign the Contributor License Agreement
(http://code.google.com/legal/individual-cla-v1.0.html); it's a prerequisite
for accepting non-trivial patches. It's a really simple procedure, though.
Original comment by se...@google.com
on 24 Apr 2012 at 11:31
Hi,
What's the status on this bug?
Original comment by se...@google.com
on 22 May 2012 at 9:46
Hi,
Given that we can't accept these patches as they are (due to missing CLA
signature, as a primary blocker), and there is no response from the bug
reporter, I'm closing it as wontfix. Feel free to reopen at a later stage (and
if so, see my plan in comment 4).
Original comment by se...@google.com
on 1 Jun 2012 at 11:23
Original issue reported on code.google.com by
skrab...@gmail.com
on 19 Apr 2012 at 7:38Attachments: