google / android-riscv64

Issues and discussions around RISC-V support in AOSP.
Apache License 2.0
198 stars 13 forks source link

external/boringssl: optimization #36

Open enh-google opened 1 year ago

enh-google commented 1 year ago

no risc-v assembler (openssl doesn't seem to have any either)

some of the https://github.com/riscv/riscv-crypto/releases/tag/v1.0.1-scalar instructions might be useful?

it seems like most people -- hardware and software -- would prefer to go straight to the vector crypto instructions instead, and that's what's been uploaded as patches to openssl already.

enh-google commented 1 year ago

summary after talking to the boringssl folks:

the highest priority thing is AES-GCM acceleration. specifically, using whatever riscv's AES and carry less or polynomial multiply acceleration is. looks like that's the Zknd and Zkne for AES and Zbc/Zkbc for carry-less multiply.

that's something where the difference between trying to write it in software and using dedicated instructions can be a 10-30x perf gap. (not 10%-30%. actually 10x-30x.)

they feel that AES and binary-field multiplication need be in the Android riscv64 ABI. in their words: "the utility of those instructions is far greater than any other crypto-related ones".

they were also interested in Zkt (basic constant-time guarantees) and -- while not their problem directly -- they thought the entropy folks would definitely want Zkr (entropy).

davidben commented 1 year ago

To add to that a bit: AES and GHASH aren't very well-suited for fast, constant-time software implementations. But they're also ubiquitous so, in devices like phones, the hardware instructions really are essential.

If they aren't part of the Android baseline, the whole ecosystem suffers:

Ideally we wouldn't be in this situation, and the world would have settled on a cipher that were more software-friendly. But cryptography didn't end up evolving that way, sadly, so AES and GHASH acceleration is effectively a requirement for this category of device.

asb commented 1 year ago

Note that Zkt is mandated in the RVA22U64 profile. Unfortunately, the scalar crypto extensions are not. Although Android baseline requirements don't need to map directly to the proposed RISC-V profiles, it would probably be better if they did. I haven't followed the discussion on profiles recently as it's been a very long-running effort in the RISC-V ecosystem that was bogged down for a long while, but lobbying for the scalar crypto extensions to be mandated in the next official profile makes a lot of sense to me. EDIT: it seems the proposed RVA23 profile is a bit ambiguous as to what crypto support is required, as currently worded, and is actually moving in the direction of vector crypto.

davidben commented 1 year ago

The thing with baselines is you only have one shot to get it right. If you require it later, it's not a baseline because there'll be devices in your ecosystem that don't meet the new requirement.

While I agree it's good to have shared profiles and work towards alignment, that is secondary to the actual needs of the system. Android should not limit itself to only what's in the shared profiles at the expense of all the concerns listed above.

enh-google commented 1 year ago

yeah, i don't think the profiles are particularly relevant to us except insofar as what we can mandate will of course be constrained by the reality of what's actually available at the time we first ship. and that's likely to be at least somewhat influenced by the profiles.

but it seems likely that stuff that's "optional" in a profile -- but actually available -- becomes part of our base ABI for Android. and one reason we're trying to do as much of this investigation as early as possible is so that we can get feedback to SoC vendors as early as possible about what they're going to need to be able to achieve parity with existing in-market devices. (given that, once we ship, that's it --- that's all an app can assume is available for the next decade+.)

asb commented 1 year ago

While I agree it's good to have shared profiles and work towards alignment, that is secondary to the actual needs of the system. Android should not limit itself to only what's in the shared profiles at the expense of all the concerns listed above.

I think we're in very strong agreement on this. I was commenting more from the perspective that it would be helpful if someone could feed into that profiles process so it doesn't become totally irrelevant, rather than suggesting this project should be constrained by it. Plus pointing out that unless the profiles work is completely dead on arrival, you can hopefully at least include Zkt in your baseline without worry. That said, the profiles development process has been very drawn out - I don't know if it's because the timing was wrong before, or if there are difficulties that mean it may be hard to contribute and not a good use of time.

It does also sound like it would be worth bottoming out if the intention of others in the ecosystem is to move towards the (not yet ratified) vector crypto spec rather than the scalar crypto extensions. I think given the wording in the RVA23 profile draft and given that the kernel patches for crypto acceleration are Zvk* based, vector crypto is likely the direction of travel.

davidben commented 1 year ago

Oh that's fun. I hadn't realized there were two sets of them. It certainly would reduce the effort all around if there were fewer duplicate extensions (less copies of AES, etc., we need to maintain and ship), so if the ecosystem is settling on the vector one and skipping the scalar one, that sounds like a nice way to reduce the cost of supporting RISC-V. But I'm not as familiar with RISC-V and timelines here.

enh-google commented 1 year ago

That said, the profiles development process has been very drawn out - I don't know if it's because the timing was wrong before, or if there are difficulties that mean it may hard to contribute and not a good use of time.

like i said, the profiles aren't really the point. we'll take exactly one baseline (to go with arm, arm64, x86, x86-64). i don't know who profiles are for, because software ecosystems can't really move like that. certainly not annually.

(davidben: the way this bug has gone off track is why i left this part of your feedback out of this bug :-) every time this topic is brought up the conversation gets derailed onto stuff that's not really the point... yes, obviously we need the highest baseline we can manage. no, profiles aren't really much help to software folks [they just help undo some of risc-v's self-inflicted hardware fragmentation problems]. but, yes, profiles are relevant to us in that they'll affect what's actually available. and, yes, we are giving feedback about the 2023 profiles. fwiw, personally i think the 2023 profile including all the optional stuff plus a few extras like pointer masking maybe looks like a decent place to start? but this bug isn't the place to talk about that. see all the other bugs. and -- most importantly -- if anyone has a specific extension in mind and proof/demonstration that we're going to need it in the base or devices will suck, please file separate bugs with as much proof as possible, so we can start those conversations now.)

nick-knight commented 1 year ago

RISC-V vector crypto enhancements to OpenSSL have been proposed: https://github.com/openssl/openssl/pull/20149

This PR remains labeled as draft while the ISA spec is still being developed by the Cryptographic Extensions Task Group: https://wiki.riscv.org/display/HOME/Cryptographic+Extensions+TG

palmer-dabbelt commented 1 year ago

That said, the profiles development process has been very drawn out - I don't know if it's because the timing was wrong before, or if there are difficulties that mean it may hard to contribute and not a good use of time.

like i said, the profiles aren't really the point. we'll take exactly one baseline (to go with arm, arm64, x86, x86-64). i don't know who profiles are for, because software ecosystems can't really move like that. certainly not annually.

The profiles are just marketing fluff, the RISC-V folks have realized that the ISA has devolved into a mess of fragmentation and need to look like they're fixing it. They're completely irrelevant for SW development, they're not even meant to match what HW is shipping so even if we could deprecate every year this wouldn't help (and that doesn't even touch on stuff like vendor self-certification).

IMO the best bet is to just ignore the profiles. If you want to set a baseline then you need to go talk to the vendors you care about supporting, figure out what they're shipping and when, and then set it yourself.

(davidben: the way this bug has gone off track is why i left this part of your feedback out of this bug :-) every time this topic is brought up the conversation gets derailed onto stuff that's not really the point... yes, obviously we need the highest baseline we can manage. no, profiles aren't really much help to software folks [they just help undo some of risc-v's self-inflicted hardware fragmentation problems]. but, yes, profiles are relevant to us in that they'll affect what's actually available. and, yes, we are giving feedback about the 2023 profiles. fwiw, personally i think the 2023 profile including all the optional stuff plus a few extras like pointer masking maybe looks like a decent place to start? but this bug isn't the place to talk about that. see all the other bugs. and -- most importantly -- if anyone has a specific extension in mind and proof/demonstration that we're going to need it in the base or devices will suck, please file separate bugs with as much proof as possible, so we can start those conversations now.)

Just to aim somewhat back on track here: I'd guess that vendors don't align on scalar-crypto vs vector-crypto on their own, which is unfortunate because that means the baseline is no-crypto. If you can strong-arm folks into one of them then vector is probably the way to go, with the big risk being that other platforms have had performance issues around large state save/restore (there was a long saga here in AVX land, for example). That's probably way better than a lack of accelerated crypto, though...

enh-google commented 1 year ago

If you can strong-arm folks into one of them then vector is probably the way to go, with the big risk being that other platforms have had performance issues around large state save/restore (there was a long saga here in AVX land, for example).

[assume i haven't even looked at the V spec, let alone the vector crypto spec...] is there more state for vector crypto than just for plain V? i think we're all in agreement that V has to be in the base, so if there's no overhead to vector crypto over that, it should be fine?

nick-knight commented 1 year ago

The vector crypto proposal does not add architectural state beyond what is already in V. It's purely instructions.

The (ratified) scalar crypto extensions include Zkr ("entropy CSR"), which does add architectural state. Zkr is independent of the vector crypto stuff, but OpenSSL may want to leverage hardware entropy if available. Zkr is an optional extension in the current draft RVA23S64 profile. (I'm stating a fact, not an openness to debating the merits of profiles or their relevance to Android.)

ciphergoth commented 1 year ago

In practice this is a security issue, not just a performance issue; without dedicated instructions, implementations of AES and GHASH generally use data-dependent table lookups which are vulnerable to timing attacks.

phoebesv commented 1 year ago

We (SiFive) are currently engaged in the development of RISC-V vector cryptography, and have implemented several critical cryptographic optimization patches for OpenSSL. We will contribute the source code, which can be useful for BoringSSL as well.

phoebesv commented 8 months ago

We at SiFive have upstreamed our RVV crypto optimization to OpenSSL and all patches have been merged into the mainline. https://github.com/openssl/openssl/pull/21923.

prashanthswami commented 5 months ago

@phoebesv Would I be able to get your help for an OpenSSL configuration change? I'm trying to update the configuration scripts to support 'android-riscv64' and I'm trying to create a new block here:

"android-arm64" => {
          inherit_from     => [ "android" ],
          bn_ops           => add("RC4_CHAR"),
          asm_arch         => 'aarch64',
          perlasm_scheme   => "linux64",
      },
      "android-x86_64" => {
          inherit_from     => [ "android" ],
          bn_ops           => add("RC4_INT"),
          asm_arch         => 'x86_64',
          perlasm_scheme   => "elf",
      },

I'm unsure about what to put for bn_ops, asm_arch or perlasm_scheme - do you have a better sense of what ought to go into these options?

prashanthswami commented 5 months ago

I think I've made some progress here, I'll propose the following:

+    "android-riscv64" => {
+        inherit_from     => [ "android" ],
+        asm_arch         => 'riscv64',
+        perlasm_scheme   => "linux64",
+    },
+
  1. The asm_arch is inherited from the 10-main.conf file and matches the ASM arch definitions in ciphers.buildinfo.
  2. The perlasm_scheme is technically meaningless, but we also copy it from 10-main.conf - when building with the riscv.pm module in perlasm, it consumes the 'flavour' argument but doesn't actually use it anywhere, like other modules do. I can confirm that I can set the value here to any string and it makes no difference to the output. In the meanwhile, we can copy linux64 just to inherit the standard things that are added to OpenSSL for the main linux64-riscv64 target.

I intentionally don't set bn_ops here, but there's some discussion to be had. By default, as our ABI name has '64', the default behavior assigns SIXTY_FOUR_BIT_LONG (this is correct) and RC4_INT to 'unsigned int'.

The setting of RC4_INT seems to have had prior discussion about a speed vs. size tradeoff. If you read the comment around the definition of RC4_INT here: https://opensource.apple.com/source/CommonCrypto/CommonCrypto-55010/Source/ccOpenssl/opensslconf.h.auto.html

/* using int types make the structure larger but make the code faster
 * on most boxes I have tested - up to %20 faster. */
/*
 * I don't know what does "most" mean, but declaring "int" is a must on:
 * - Intel P6 because partial register stalls are very expensive;
 * - elder Alpha because it lacks byte load/store instructions;
 */

However, note that while the link above defaults the type to 'unsigned char', there's a separate instance in Google's old opensource mirror where it was set to 'unsigned int': https://chromium.googlesource.com/chromium/deps/openssl/+/c9490d33b98b7affb729b5f1db13cb0a348471a1%5E%21/

I guess the question is - what's the most reasonable default for Android RISC-V? It appears that for ARM/ARM64 boards, we used unsigned char, but for X86/X86_64, we used unsigned int. Perhaps there's a risk of too much runtime resource cost that is borne on handheld-hardware but not particularly relevant for x86, which usually runs on desktop-class machines? Alternatively, maybe today's hardware is generally capable of supporting this resource usage, and this would be a meaningful performance win?

phoebesv commented 5 months ago

@prashanthswami Thank you for providing support for the Android RISC-V configuration!

The main configuration file contains the riscv64 target configuration https://github.com/openssl/openssl/blob/master/Configurations/10-main.conf#L810-L816 may be useful to you. I'm not well-versed in Android SSL Build, but I noticed that there are no default bn_op settings in main.conf. Is it necessary to configure them for the Android build? Could you elaborate on the intended usage?

In addition, I am aware that RC4 is outdated and is associated with the legacy provider. If you are developing on the newer version of OpenSSL, it is generally not recommended to use both (RC4_INT and RC4_CHAR). Perhaps @paulidale could provide some insight on the bn_op setting?

prashanthswami commented 5 months ago

Is it necessary to configure them for the Android build? Could you elaborate on the intended usage?

At the moment, I'm looking at platform/tools/ndkports, which uses OpenSSL. I won't go so far as to say we need them for the Android build, the issue is more for the community at large - I'm sure someone out there is relying on code that leverages OpenSSL and may need to build an Android library for it.

In regards to "it is generally not recommended to use both" - this is indeed what their documentation says, but I think it may give the incorrect impression that by not specifying one or the other, you are disabling them somehow. However, the configuration header always supplies a define for it and if left unspecified, the default is 'unsigned_int' if 'bn_ops' does not explicitly set RC4_CHAR (for reference, this block in Configure and the foreach below it).

I should say that I admittedly don't know who uses RC4 anymore in general, as I think(?) it's widely discouraged now due to the brute-force IV attacks on it. Perhaps our selection doesn't practically matter, and we should just assume the default?

enh-google commented 5 months ago

boringssl deleted the RC4_INT cruft and just hard-coded uint32_t:

void RC4(RC4_KEY *key, size_t len, const uint8_t *in, uint8_t *out) {
  uint32_t x = key->x;

my assumption would be that someone thought (and maybe even measured) that unsigned char was a good idea for arm32, but that was just copy & pasted to arm64. but all that talk of sparc and alpha and k6 makes me think all this is just 1990s cruft.

so, yeah, +1 to "just take the default unsigned int"...

prashanthswami commented 5 months ago

Thanks for the feedback folks! I've pushed this for the OpenSSL folks review here: https://github.com/openssl/openssl/pull/23427