Closed hvr closed 7 years ago
This dealing with alignment is a pain. Currently raaz just memcpy's the data. Is that not enough. I mean does memcpy not take care of any alignment problems?
It looks like the C version of memcpy is supposed to work correctly but some google searches suggest that certain version of gcc with optimisation might create problems. I found this old thread for example
https://gcc.gnu.org/ml/gcc-bugs/2000-03/msg00155.html
I am not sure how relevant this is for the current crop of compilers.
if you memcpy into an aligned buffer, then that's good enough IMHO (and I'd blame the compiler if it turns a memcpy
into unaligned mem-accesses)
I do not think that the destination address is always mem-aligned. So what you raised might be a problem. For example consider the source [s1,s2]
( [a]
is a source if a
is) and assume that s1 has
length that is not a multiple of the alignment. then when copying the first memcpy will go through but not the second. I think this is a problem.
The man page of memcpy of strings.h does not mention anything about alignment, either on the source or destination.
You are correct. I think there is a bug in our code.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/4972.html
It looks like all the fix is simple at least on arm. One looks like all one needs to do is put appropriate casts. Is that the solution that you had in mind ?
This insanity of ARM compilers is well documented.
http://stackoverflow.com/questions/24883410/armcc-problems-with-memcpy-alignment-exceptions
However, it seems to say that gcc is still fine. There might be a performance hit though. I will leave this issue open. This is serious.
Coming to think of it our implementation will not incur the alignment
penalty. We always copy to the temporary buffer (which is aligned see
the combinator allocaBuffer
in the repository) data in multiples of
blocks of the hash (which we assume is a multiple of the machines word
alignment). In that case the only penalty we have is unnecessary
copying. (Assuming that one has a sane C compiler which compiles the
FFI portion).
Where our implementation incurs a cost is in the case when say we have a lazy bytestring all of whose chunks are starting at an alignment boundary. In that case we incur an additional cost of copying to the temporary buffer which was unnecessary except when processing the last block (because of padding). From the generic point of view this is all that we can do as we see the bytestring as just a byte source and not as a list of buffers.
Therefore I am now convinced that this is not a release blocking bug. It is an optimisation we can have later on. I am not assigning priority to this but leaving this issue open.
@hvr your opinion?
I am closing this ticket as I do not think it is worth pursuing this change. Will reopen if benchmarks show very bad result. But for that we need benchmarks with our cousin libraries. Will reconsider this at that time.
I was looking through the implementation and I looks to me like it's possible that unaligned memory buffers are passed the C implementation, since e.g.
ByteString
is also aPureByteSource
and the code appears to generates buffer-chunks without checking the alignment.PS: This is a portability pet-peeve of mine, and one of the things I had to fix in
cryptohash-sha256