sharplispers / ironclad

A cryptographic toolkit written in Common Lisp
BSD 3-Clause "New" or "Revised" License
166 stars 28 forks source link

Make ironclad modular #44

Open zen-wq opened 3 years ago

zen-wq commented 3 years ago

I know it's a somewhat daunting task but the absolute majority of Ironclad uses I've seen only need MD5, SHA1, and byte-array-to-hex-string.

Frankly, I have no idea how to go about it. Splitting it into ironclad.digest and ironclad.cypher would be a good start, but more than half of the digests are also dead weight.

luismbo commented 3 years ago

This would be useful. Perhaps the ironclad system can continue loading everything by default, but having the possibility to load core functionality and specific algorithms would be nice since ironclad does take a while to compile.

zen-wq commented 3 years ago

Yes, that's how I did it with cl-charms. ironclad system should just load all the separate ironclad modules so existing users don't break.

glv2 commented 3 years ago

Frankly, I have no idea how to go about it. Splitting it into ironclad.digest and ironclad.cypher would be a good start, but more than half of the digests are also dead weight.

The "modules" that take the most time to compile are ciphers and digests, the rest compiles pretty fast in comparison. And as some digests are based on ciphers, having ironclad.cipher and ironclad.digest packages would not reduce the compilation time a lot. For example, using just one digest would still require compiling all ciphers and all digests.

To have a really smaller compilation time, we would need to have one system/package per cipher and per digest. That will take some work.

luismbo commented 3 years ago

@glv2 why is that? Couldn't multiple systems share the same package?

zen-wq commented 3 years ago

They can. That would be one system per cipher/digest.

luismbo commented 3 years ago

Here's a lightly tested first try: https://github.com/luismbo/ironclad/commit/879ab77dddf5701e881791141ba3d03d0c57798e

glv2 commented 3 years ago

Here's a lightly tested first try: luismbo@879ab77

I took a look at your patch, here are a few notes:

keystream is not a cipher, it's a function to jump anywhere in the stream of pseudo-random bytes generated by some ciphers (not all ciphers support this operation).

The dependency list of some of the systems will have to be modified:

Also, aead, kdf, macs, prng and public-key depend on digests and ciphers, so I think we'll need to organize the systems in the following way:

Maybe it could be useful to also have the systems ironclad/ciphers, ironclad/digests, ironclad/aead, ironclad/macs, etc...

luismbo commented 3 years ago

Regarding that last suggestion, would ironclad/ciphers load all the the ciphers?

glv2 commented 3 years ago

Yes, and ironclad/digests would load all the digests. These would in fact be almost empty system definitions with a long list of dependencies.

luismbo commented 3 years ago

Here's another go, even more untested than the previous one: https://github.com/luismbo/ironclad/commit/781d638f4bebe9a00367f0dd7ecde969d53dd62f. It adds a simple way to define dependencies and I've included the ones you've mentioned, but haven't reviewed the compiler warnings or anything like that to see if more are missing.

Not sure about ironclad/public-key vs ironclad/public-keys (likewise for aead and kdf).

A potential issue about this split is that changing a file, e.g. updating a macro, within ironclad/core does not trigger the recompilation of ironclad/cipher/foo, IIRC. It'd be nice to tell ASDF to do such recompilations.

glv2 commented 2 years ago

@luismbo I created a split-systems branch with some fixes on top of your patch. When loading the global ironclad system, it looks like things work fine and the tests pass. I made a few tests loading only some subsystems, and they worked. Now, let's tests this more thoroughly before it can be merged into the master branch... Please report any issue or weird behavior than you can find.

luismbo commented 2 years ago

@glv2 thanks for picking this up. In terms of testing, perhaps we could load each subsystem individually (in a fresh lisp, let's say SBCL) then run the tests? (It'd be convenient if we could get away with detecting undefined ciphers/digests/etc rather than explicitly spliting the tests, but perhaps that could fail to detect missing dependencies, so perhaps explicitly splitting the tests is the way to go.) Does this seem like an approach worth exploring?

glv2 commented 2 years ago

I guess the best solution will be for each subsystem to have its tests, and the test target for the global ironclad system will run the tests of all the subsystems. This will require an overhaul of our test system.

Do you think we should make these subsystems available to users in a release soon (as experimental feature), or wait until the new test system is done?

luismbo commented 2 years ago

Releasing this as an experimental feature sounds like a good idea!

glv2 commented 2 years ago

Subsystems are available as experimental feature in Ironclad 0.56.

fjl commented 2 years ago

It kind of feels like ASDF's package-inferred-system would be ideal for ironclad. I've been using it for all my own projects for the last two years or so, and it really works. There are specific advantages with package-inferred-system over defining your own structure: it forces you to get your dependencies right, and also forces creation of a sane package structure (because source files correspond 1-to-1 with lisp packages).

Using package-inferred-system for ironclad would mean that users only needing sha256 (for example), could depend on ironclad/digests/sha256 and have only this algorithm and its dependencies loaded into their image.

glv2 commented 2 years ago

With the subsystems in Ironclad 0.56, you can do (asdf:load-system "ironclad/digest/sha256") and it will only load what is necessary for SHA256.

I'm not really a fan of the mandatory "one file = one package" of package-inferred-system. There are cases when I prefer having a package split in several files, or several packages in a file.

fjl commented 2 years ago

Understood, just wanted to bring it up because I feel it's underused. Also, I looked at the modularization changes and thought, wow, that's a lot of boilerplate for defining the additional systems/packages.

BTW, you can always define additional packages containing subsets of exported symbols with package-inferred-system, if you want larger groupings of functionality. As you may know, the idea behind P-I-S is mostly that it will figure out which source file to load based on the packages listed in :USE or :IMPORT-FROM. You can even define more than one package in a single file, the file just needs to also define the canonical package it was loaded for.

fjl commented 2 years ago

Having tried the subsystem thing now, here are two observations about the current approach.

(1) Since all API is provided by a single package, not all exported symbols are actually defined when only some subsystems are loaded. This is kind of obvious, but also really goes against the 'usual way'.

(2) For users of package-inferred-system the subsystem approach is not easy to work with. Loading, say "ironclad/digests/sha256" does not lead to a package of the same name being defined, so it's not possible to depend on it using

(defpackage ... (:use #:ironclad/digests/sha256))

I can work around this in my application by defining a wrapper system in .asd that depends on selected ironclad bits, then create a package to re-export this functionality. It's doable but needs boilerplate code :(.

luismbo commented 2 years ago

@fjl you could also (eval-when (...) (asdf:load-system ...) at the start of your file, maybe?

fjl commented 2 years ago

It's not a good workaround because ASDF doesn't like it when Lisp files do their own operating during load. ASDF is supposed to know the dependencies of every file so it can plan the build properly.