tromp / cuckoo

a memory-bound graph-theoretic proof-of-work system
Other
822 stars 173 forks source link

Use unaligned load for avx2 in these two spots #40

Closed wilsonk closed 6 years ago

wilsonk commented 6 years ago

It appears as though these unaligned loads are needed on Linux for mean_miners that have more than one siphash round. I tested with clang-5.0, clang-3.9 and gcc-5.4 and they all segfault without the loadu's. I also tried to align the struct manually, but that didn't seem to fix the problem, strangely. These changes shouldn't really affect performance as these functions are only called a few times, I think (and newer chips handle unaligned loads well these days).

Also, I set up my travis instance to test on trusty since I noticed that your travis config is using that version of Ubuntu, while I am on xenial. No change.

I also changed travis to test demo30x8, since only demo30x1 was being built/tested. It failed on trusty and xenial. Perhaps it would be advisable to test one of these other mean_miner versions on your travis setup?

Thanks, Kelly

tromp commented 6 years ago

dear Kelly,

on your system, can you print out the edgetrimmer/sip_keys address to check its alignment? and also with the declaration changed to alignas(16) siphash_keys sip_keys; I don't want it to be unaligned. As to demo30x8, I cannot assume that Travis CI is run on a system with avx2, and it will crash without avx2. Better keep non-portable builds out of automated tests.

wilsonk commented 6 years ago

Hello John,

It looks like the problem wasn't the sip_keys struct after all, it was actually the edgetrimmer class being dynamically allocated on a 16 byte boundary! AVX2 requires 32 byte boundaries (SSE is 16). Sorry I missed that and wasted your time.

Tested with clang and gcc on Linux Xenial, of course.

Thanks, Kelly

P.S. I didn't know that travis might use older processors, but of course that makes sense. My bad.

tromp commented 6 years ago

Thanks for re-testing Kelly. If 32 byte alignment is required, why not add a simple alignas(32) in the sip_keys declaration? Did you try that? I don't like redefining the new operator.

wilsonk commented 6 years ago

Hi John,

Understood, redefining the new operator certainly wasn't my first option either. However, the edgetrimmer instance must end up on a 32 byte boundary...then the siphash_keys struct that is allocated will be aligned on 32 bytes because it is the first element in the class. The dynamic allocation essentially takes precedence, so even if we tell the compiler to allocate the siphash_keys struct on a 32 byte boundary it could still end up on a 16 byte boundary.

I tried a myriad of other solutions to try to fix this. That includes aligning sip_keys on boundaries including 16,32 and 64. That means I tried aligning the struct definition, plus the declaration inside the class, on each combination.

The solution I am proposing now works without the redefinition of the new operator, and it is quite simple (ie. just pad using those two pointers), but it makes no guarantees about alignment if someone accidentally declares a new variable above sip_keys in the future! (You could also just put two pointers named 'padding_don't_delete' or something, if you like, of course). If this is acceptable to you then this solution works for me now with clang and gcc. I suppose you have a few solutions to pick from actually ;)

P.S. This is a bit of a nightmare because it is so fragile. If I try to keep the new operator overload, and also keep the sip_keys declaration below the two pointers, then things break again!!! So basically I have moved the sip_keys declaration below the two pointers, gotten rid of the new operator overload, and then put the warning comment in there and now it is just time to hope! Loadu exists for a reason it would seem ;)

P.P.S There might be a more robust solution to this but it would probably involve some real compile time and runtime magic to make it robust. Not sure. I am also unsure how OS X is handling this differently, such that it works? Does the OS X version work with the two padding pointers? Or is it misaligned now?

Thanks, Kelly

tromp commented 6 years ago

dear Kelly,

Understood, redefining the new operator certainly wasn't my first option either. However, the edgetrimmer instance must end up on a 32 byte boundary...then the siphash_keys struct that is allocated will be aligned on 32 bytes because it is the first element in the class. The dynamic allocation essentially takes precedence, so even if we tell the compiler to allocate the siphash_keys struct on a 32 byte boundary it could still end up on a 16 byte boundary.

I did some googling online about alignment and new, and https://stackoverflow.com/questions/15511909/does-the-alignas-specifier-work-with-new confirms your suspicion. I printed alignof(std::max_align_t) on my Linux box and it comes out as 16. So my sip_keys is overaligned and there is no simple solution.

The solution I am proposing now works without the redefinition of the new operator, and it is quite simple (ie. just pad using those two pointers), but it makes no guarantees about alignment if someone accidentally declares a new variable above sip_keys in the future! (You could also just put two pointers named 'padding_don't_delete' or something, if you like, of course). If this is acceptable to you then this solution works for me now with clang and gcc. I suppose you have a few solutions to pick from actually ;)

I don't see how moving those 2 pointers (16 bytes together) in front would guarantee 32 byte alignment on sip_keys. This would seem to just accidentally work on your system.

P.S. This is a bit of a nightmare because it is so fragile. If I try to keep the new operator overload, and also keep the sip_keys declaration below the two pointers, then things break again!!! So basically I have moved the sip_keys declaration below the two pointers, gotten rid of the new operator overload, and then put the warning comment in there and now it is just time to hope! Loadu exists for a reason it would seem ;)

In light of the above, I decided to adopt your previous solution of redefining new as the least fragile (if somewhat ugly) solution. Thanks again for all your testing efforts!

regards, -John

tromp commented 6 years ago

dear Kelly,

In light of the above, I decided to adopt your previous solution of redefining new as the least fragile (if somewhat ugly) solution.

I'm actually not sure how to accept 2-of-3 of the commits in a pull request; would you mind reverting to your previous pull request?

regards, -John

wilsonk commented 6 years ago

Hey John,

Hopefully that is what you are looking for with that reverted commit. Sorry it took me so long, been busy today :)

Thanks, Kelly

tromp commented 6 years ago

dear Kelly,

Hopefully that is what you are looking for with that reverted commit. Sorry it took me so long, been busy today :)

No problem. Just before confirming the PR, I realized that this is only necessary in case NSIPHASH > 4. So the override of new would be better off in a block of

if NSIPHASH > 4

/ redefine new, replacing 32 by NSIPHASH sizeof(u32) */

endif

Could you please test and make this final tweak? Thanks!

regards, -John

wilsonk commented 6 years ago

Hi John,

Woops, I should have done that in the first place. Fixed and tested, seems to work fine with these changes.

Thanks, Kelly