jens-maus / amissl

:closed_lock_with_key: AmiSSL is the AmigaOS/MorphOS/AROS port of OpenSSL. It wraps the full functionality of OpenSSL into a full-fledged Amiga shared library that makes it possible for Amiga applications to use the full OpenSSL API through a standard Amiga shared library interface (e.g. web browsers wanting to support HTTPS, etc.)...
Apache License 2.0
88 stars 15 forks source link

OpenSSL 3.0 / AmiSSL v5 #60

Closed Futaura closed 2 years ago

Futaura commented 2 years ago

I'm currently working on merging/porting OpenSSL 3.0 into AmiSSL. The main challenge is the OS3 limitation of a 32K shared library jump table, which means a hard limit of approx 5400 functions can be added. With AmiSSL v4 we are only just below this limit with OpenSSL 1.1.x. This really has to be the first thing to be resolved before we can move on.

Unsurprisingly, OpenSSL 3.0 has increased the number of public functions, even after removing some older ones. with around 5950 public functions, by default. So, what to do?

First of all, it makes sense to jump to AmiSSL v5 for OpenSSL 3.0, as ABI changes in OpenSSL mean that applications will need to be recompiled to use the new OpenSSL whatever we do. This also gives us a chance to hit the reset button in AmiSSL, where necessary.

The options I've been considering are below and are not necessarily mutually exclusive:

Does anybody have any better ideas? All opinions welcome.

patrikaxelsson commented 2 years ago

As far as I could reason, I could not come up with a reason for a 32K shared library jumptable limit in OS3 itself. The addresses in the jumptables are 32-bit and the function address list itself you give to the OS to build that jumptable from is of arbitrary length of 32-bit pointers terminated by a -1 (can be 16-bit offset from the table start, but..).

Had to test and found no problems with the library part.

However, there are at least two problems when making the software which uses such massive library:

  1. After ~5400 functions, the address register indirect with displacement indexing normally used with the jsr instruction in protos and C-stubs cannot be used as it max handles a 16-bit signed displacement - ex jsr -30(a6).
  2. After ~5400 functions, atleast the fd2pragma 2.197c I have still uses address register indirect with displacement indexing plus calculates the jump offset wrong - ex jsr --32764(a6).

If fd2pragma is fixed to instead use address register indirect with index indexing after ~5400 functions plus calculate the jump offset correct, you can create software which uses such massive libraries. One drawback is that you have to sacrifice a data or address register to load the jump offset into, but that is usually not a problem.

Created a library with 6007 functions and hand-patched a few of the VBCC function inlines and the C-stubs to show some different examples where:

$ diff -r -u --color original/ patched/
diff -r -u --color original/include/inline/massive_protos.h patched/include/inline/massive_protos.h
--- original/include/inline/massive_protos.h    2021-10-06 23:43:00.000000000 +0200
+++ patched/include/inline/massive_protos.h 2021-10-06 22:32:06.000000000 +0200
@@ -18003,15 +18003,16 @@
 #define Add5999(num) __Add5999(MassiveBase, (num))

 LONG __Add6000(__reg("a6") struct Library *, __reg("d0") LONG num)="\tjsr\t--29512(a6)";
+LONG __Add6000(__reg("a6") struct Library *, __reg("d0") LONG num)="\tmove.l\t#-36024,d1\n\tjsr\t(a6,d1.l)";
 #define Add6000(num) __Add6000(MassiveBase, (num))

-LONG __AddManyByVal(__reg("a6") struct Library *, __reg("d0") LONG numVal1, __reg("d1") LONG numVal2, __reg("d2") LONG numVal3, __reg("d3") LONG numVal4)="\tjsr\t--29506(a6)";
+LONG __AddManyByVal(__reg("a6") struct Library *, __reg("d0") LONG numVal1, __reg("d1") LONG numVal2, __reg("d2") LONG numVal3, __reg("d3") LONG numVal4)="\tmove.l\t#-36030,a0\n\tjsr\t(a6,a0.l)";
 #define AddManyByVal(numVal1, numVal2, numVal3, numVal4) __AddManyByVal(MassiveBase, (numVal1), (numVal2), (numVal3), (numVal4))

-LONG __AddManyByRef(__reg("a6") struct Library *, __reg("a0") LONG * numRef1, __reg("a1") LONG * numRef2, __reg("a2") LONG * numRef3, __reg("a3") LONG * numRef4)="\tjsr\t--29500(a6)";
+LONG __AddManyByRef(__reg("a6") struct Library *, __reg("a0") LONG * numRef1, __reg("a1") LONG * numRef2, __reg("a2") LONG * numRef3, __reg("a3") LONG * numRef4)="\tmove.l\t#-36036,d0\n\tjsr\t(a6,d0.l)";
 #define AddManyByRef(numRef1, numRef2, numRef3, numRef4) __AddManyByRef(MassiveBase, (numRef1), (numRef2), (numRef3), (numRef4))

-LONG __AddManyByValAndRef(__reg("a6") struct Library *, __reg("d0") LONG numVal1, __reg("d1") LONG numVal2, __reg("d2") LONG numVal3, __reg("d3") LONG numVal4, __reg("a0") LONG * numRef1, __reg("a1") LONG * numRef2, __reg("a2") LONG * numRef3, __reg("a3") LONG * numRef4)="\tjsr\t--29494(a6)";
+LONG __AddManyByValAndRef(__reg("a6") struct Library *, __reg("d0") LONG numVal1, __reg("d1") LONG numVal2, __reg("d2") LONG numVal3, __reg("d3") LONG numVal4, __reg("a0") LONG * numRef1, __reg("a1") LONG * numRef2, __reg("a2") LONG * numRef3, __reg("a3") LONG * numRef4)="\tmove.l\td4,-(a7)\n\tmove.l\t#-36042,d4\n\tjsr\t(a6,d4.l)\n\tmove.l\t(a7)+,d4";
 #define AddManyByValAndRef(numVal1, numVal2, numVal3, numVal4, numRef1, numRef2, numRef3, numRef4) __AddManyByValAndRef(MassiveBase, (numVal1), (numVal2), (numVal3), (numVal4), (numRef1), (numRef2), (numRef3), (numRef4))

 #endif /*  _VBCCINLINE_MASSIVE_H  */
diff -r -u --color original/lib/Add6000.s patched/lib/Add6000.s
--- original/lib/Add6000.s  2021-10-06 23:43:05.000000000 +0200
+++ patched/lib/Add6000.s   2021-10-06 23:41:01.000000000 +0200
@@ -11,6 +11,7 @@
    MOVE.L  A6,-(A7)
    MOVEA.L _MassiveBase,A6
    MOVE.L  08(A7),D0
-   JSR --29512(A6)
+   MOVE.L  #-36024,D1
+   JSR (A6,D1.L)
    MOVEA.L (A7)+,A6
    RTS
diff -r -u --color original/lib/AddManyByRef.s patched/lib/AddManyByRef.s
--- original/lib/AddManyByRef.s 2021-10-06 23:43:05.000000000 +0200
+++ patched/lib/AddManyByRef.s  2021-10-06 23:42:25.000000000 +0200
@@ -11,6 +11,7 @@
    MOVEM.L A2/A3/A6,-(A7)
    MOVEA.L _MassiveBase,A6
    MOVEM.L 16(A7),A0/A1/A2/A3
-   JSR --29500(A6)
+   MOVE.L  #-36036,D0
+   JSR (A6,D0.L)
    MOVEM.L (A7)+,A2/A3/A6
    RTS
diff -r -u --color original/lib/AddManyByValAndRef.s patched/lib/AddManyByValAndRef.s
--- original/lib/AddManyByValAndRef.s   2021-10-06 23:43:05.000000000 +0200
+++ patched/lib/AddManyByValAndRef.s    2021-10-06 22:35:11.000000000 +0200
@@ -11,6 +11,9 @@
    MOVEM.L D2/D3/A2/A3/A6,-(A7)
    MOVEA.L _MassiveBase,A6
    MOVEM.L 24(A7),D0/D1/D2/D3/A0/A1/A2/A3
-   JSR --29494(A6)
+   MOVE.L  D4,-(A7)
+   MOVE.L  #-36042,D4
+   JSR (A6,D4.L)
+   MOVE.L  (A7)+,D4
    MOVEM.L (A7)+,D2/D3/A2/A3/A6
    RTS
diff -r -u --color original/lib/AddManyByVal.s patched/lib/AddManyByVal.s
--- original/lib/AddManyByVal.s 2021-10-06 23:43:05.000000000 +0200
+++ patched/lib/AddManyByVal.s  2021-10-06 23:42:36.000000000 +0200
@@ -11,6 +11,7 @@
    MOVEM.L D2/D3/A6,-(A7)
    MOVEA.L _MassiveBase,A6
    MOVEM.L 16(A7),D0/D1/D2/D3
-   JSR --29506(A6)
+   MOVE.L  #-36030,A0
+   JSR (A6,A0.L)
    MOVEM.L (A7)+,D2/D3/A6
    RTS

This can also be done with the GCC inlines, but not sure about the SAS/C pragmas. They are for sure incorrectly generated at least, but I think SAS/C by itself generates the call, so it could also fail. If so, the C-stubs can always be used.

There are two included test programs - TestMassive and TestMassive_stubs and they just verify that functions outside the 32K barrier can be called:

> TestMassive 
Add1(0):                                    1
Add6000(0):                                 6000
AddManyByVal(1, 2, 3, 4):                   10
AddManyByRef(5, 6, 7, 8):                   26
AddManyByValAndRef(8, 7, 6, 5, 4, 3, 2, 1): 36

Source, compiled binaries etc: Massive.zip

Anyway, continuing on. 5000 functions in an API is massive, especially when you can do a client with around 20 of them. Has openssl just kept adding vigorously and adding forever, never breaking backwards compatibility? I mean, there can't be any software using all 5000, right?

On the topic of libcurl, it uses an internal middle layer API for TLS called VTLS, which is uses to be able to support like 10-20 different tls libraries. Seems to capture what is needed in a quite small API, vtls.h is quite short. Anyway, you could find all openssl functions used in the openssl VTLS implementation.

Even if it was fun playing with massive libraries, I am all for pruning, but I guess it depends on what the objective is. If the objective is to allow something really old written for openssl to compile, you of course want to keep them all. If its mostly new development, on the other hand, it doesn't matter at all how much you prune, as long as it still is working :D.

Futaura commented 2 years ago

Yes, I should have said it is more of a compiler/tools limitation rather than a physical limitation of an OS3 library, I think. It's such a long time ago, I forget what I used to test this - IIRC the pragmas file was broken. For AmiSSL we use sfdc to generate most of the files from the fd/sfd, which itself is generated by idltool. I can modify both these if the compilers can handle the results (not sure if the SAS/C tagcall could be a problem). I'll do some more checking on this later.

Backwards compatibility usually gets broken between major versions - there were quite some changes required to IBrowse between 0.9.7 (AmiSSL v3) and 1.1.0 (AmiSSL v4) to get things working. I think they may have removed some functions for OpenSSL 3.0, but less than they added. For AmiSSL, we have to be a little more careful - if an OpenSSL function is removed, we can't simply remove it from the jump table or interface completely, as it will obviously mean that newer AmiSSL versions would then crash as the library offsets would all change, so we simply mark them as unimplemented and leave a dummy function there. At least with AmiSSL v5, we could afford to clear those out, but it does mean the SDK then is not compatible with previous versions - the jump table integrity is maintained between AmiSSL v3 and v4, for example.

Of course, OpenSSL is more than SSL - the majority of functions are actually in the Crypto part of the library, and you don't necessarily need to be an app which needs to use SSL to use OpenSSL. You're right that most applications are probably not going to use more than 100 of the API functions, at least directly.

It is a bit annoying that they deprecated 500 or so functions, rather than remove them from the public API already - also, OpenSSL still internally uses these deprecated functions and not the newer API functions they introduced to replace them. To get around the problem of ABI changes due to publicly defined structures changing, these were mostly made private (in 1.1.0, IIRC) with functions added to access and set properties in the structures, which of course allows structures to be changed without breaking ABI compatibility - a good thing, but it meant a lot of functions added (loads for incrementing/decrementing reference counters, two variations of some functions - one which increases the ref count automatically, one which doesn't). Then there are some functions (200 or so, IIRC) where somebody decided they wanted to add extra arguments, so you end up with somefunc(a,b) and somefunc_ex(a,b,c) - IMHO, they should have got rid of the originals for OpenSSL 3.0. I guess there will be somefunc_ex2(a,b,c,d) next (varargs would be a better solution in some ways). For such a bit version jump, I expected more of a clean up than there actually is.

Futaura commented 2 years ago

As far as I can tell #pragma libcall in SAS/C does not support offsets larger than (-)32K. The GCC inline macros would need to be modified, similar to VBCC as you've shown. Who knows about the other compilers in use. For OS4, the situation regarding 68k->PPC cross calls needs checking also to see if that supports larger jump tables also. I get what you're saying about C-stubs, although to get those into link libraries will require the developers to build them, as we can't really cross compile those. That said, link libraries with stubs would come in handy for those OpenSSL functions that take an OpenSSL function as an argument.

With all this in mind, I'm leaning towards avoiding all this, and going with multiple library bases (but, only one physical library), if there is some sane way to split and organise them. I'm going to look into this further, but currently I'm thinking AmiSSLBase will house the AmiSSL native functions as it does now and all those from libssl (and perhaps the SSL and/or HTTP related ones from libcrypto. Then there could be AmiSSLBIOBase for all the BIO functions, AmiSSLEVPBase for all EVP functions, maybe AmiSSLCertBase for certification related functions and AmiSSLCryptoBase for everything else. Ideally, for OS4, we could just have AmiSSLBase with multiple interfaces, or even one big interface as it is now, but I don't want to complicate things with the xml and idltool usage, especially when it comes to 68k->PPC cross calls, so it may just be simpler to have multiple bases on OS4 too to mirror OS3. Anyway, when I get a chance I'm going to wade through all the OpenSSL API functions and try to organise them into groups, to see how the numbers work out.

patrikaxelsson commented 2 years ago

Sounds nice with the separation into AmiSSLBIOBase, AmiSSLEVPBase, AmiSSLCertBase and AmiSSLCryptoBase.

Futaura commented 2 years ago

Sounded nice in my head too, but in practice I'm finding them hard to separate this way as some functions could fit into more than one group - there is a lot of interaction going on. BIO for example is under 300 functions in total. Sorting all the OpenSSL functions alphabetically makes slightly depressing reading. Even grouping functions using the component view at https://www.openssl.org/docs/OpenSSL300Design.html doesn't gain much - "Protocols" (i.e. SSL, TLS, HTTP, CMS, TS and OCSP) equates to around 950 functions in total, leaving 3650 functions in "Common", with the deprecated and unimplemented API functions removed.

So, this brings me back to AmiSSLBase plus AmiSSLExtBase (or AmiSSLExtraBase, or some better name) once the AmiSSLBase jump table reaches 32K. Although not ideal, this approach will at least save time when it comes to adding new OpenSSL API functions in the future as they won't have to be sorted. My plan is to add amisslmaster.library/OpenSSLTags() which will effectively combine both amisslmaster.library/InitAmiSSLMaster() and amisslmaster.library/OpenSSL(), whilst adding more flexibility with the tag based interface - both library bases would be obtained using this new function.

Futaura commented 2 years ago

Last thing to check on this is regarding backwards compatibility. Need to check if public structures have been changed and if API functions have either been changed or removed completely (opposed to just being declared deprecated). If possible, we could retain backwards compatibility in AmiSSL, allowing applications compiled for AmiSSL v4 to automatically use v5. This would obviously force the issue on making deprecated functions available and a second library base.

Futaura commented 2 years ago

Have now got AmiSSL v5 compiled with OpenSSL 3.0.0, whilst retaining the old library jump table offsets for backwards compatibility. Preliminary testing shows that IBrowse 2.5 is working fine with it, despite still being compiled for AmiSSL v4. This is currently all with one single massive library interface on OS4 (obviously, more work to do for OS3, as above).

Most deprecated functions that are no longer available in OpenSSL 3.0 are actually replaced by macro definitions in the OpenSSL includes remapping to the new functions, as standard, So, apps built with the v5 SDK will use these macros directly, whilst those with the v4 SDK will use the existing library entry points, which works fine. So, far I have detected no major changes in API functions - it is mainly just a load of consts that have been added to some parameters. The only things that have been completely removed in OpenSSL 3.0 is the RAND_DRBG API functions, some FIPS and mem debug options (which would essentially be no-ops in AmiSSL v4 anyway). Need to check the public structures next, but I suspect there won't be an issue there due to most, if not all, being made opaque for OpenSSL 1.1.x.

I think this will be the way to go, to prevent all the apps needing to be recompiled (obviously, this approach was impossible between AmiSSL v3 and v4, due to the public structure referencing that was part of OpenSSL in AmiSSL v3).

Futaura commented 2 years ago

Recording my findings on the structures defined in the public OpenSSL headers - unfortunately, there are still quite a few and 9 have actually been modified:

Whilst some could easily be patched, the x509v3 and pkcs7 structures are much more tricky. So, looks like we will have to abandon blanket backwards compatibility to OpenSSL 1.1.x, but could still allow existing applications that specify FALSE for UsesOpenSSLStructs to use OpenSSL 3.0. Applications that specify this as TRUE will need recompiling with the new SDK.