Please, refactor the Hash

Anton-Latukha commented 3 years ago

This report is for discussing the hash code refactoring.

All ideas and collaborations are welcome.

Current hash use in projects

Cryptohash history:

`vincenthz`, author of `cryptohash` created a `cryptonite`, and eventually deprecated the `cryptohash` 5 years ago and declared that `cryptonite` superseded it: https://github.com/vincenthz/hs-cryptohash#readme. At the time of writing of it the `cryptonite` only was in the initial development and matured in 2016. HVR forked `cryptohash` and split it into a set of `cryptohash-*` packages. In 2016. And the same year (2016) the maintainer commits stopped in the `cryptohash-sha512`, and for the most used `cryptohash-sha256` - in the 2017. HNix-Store switched from `cryptonite` to HVR forks 2.5 years ago, in 2018. HVR forks become fully unmaintained this year (2020). Actually the last commit to `cryptohash-sha512` was 2018-03-18, is 2.75 years ago (at the time of writing), right around the time the HNix-store switched to it. Which in time created a direct problem for the HNix-store, currently with `cryptohash-sha512`. The team attended to solving upstream issues in advance actively, the reports & PRs to provide `base 4.14` is waiting there for 10 months. HVR is probably busy, his activity on GitHub this year is small. `haskell-hvr` group from the commit merges seems to have 1 person, because nobody other merges the changes in the projects. People pinged him. To reach-out, wrote to the `hvr@gnu.org` - and got no response. I lay this out just as the case that `cryptohash-*` essentially just bitrots. There is no reason to hold to it, except if the family of the packages would be maintained actively and readopted, most probably by `haskell-nix` as the most active around the stale projects. --- Inside the HNix-Store: #4 #14 #18 #25 #28 #27 #38 #83 #90

--- HNix hashing for ease of work/refactoring this spring moved to the `hashing` - a pure but slow Haskell library. `cryptonite`, at least already - has few dependencies, `basement` and `memory` - which besides themselves are free - because they have the same deps that `cryptonite` already has. `memory` also has a low-level implementation of ByteString on pointers, which maybe can be used. And because the `basement` and `memory` are without dependencies - they would be easy and quick to support the new `base` releases, which allows actively used `cryptonite` also to stay current. I compared closures of `cryptonite` and `cryptohash-sha512`, and it is: ~2350 to ~2150, so the overall compiled size difference is `0.085` part 8.5%. --- My personal perception: The current situation directly shows that maintainability is much more important than the dependencies/storage size. Since hashing is the core operation of the Nix design - the next priority of choice is the speed. Which of the libraries is faster - that library should be used.

Anton-Latukha commented 3 years ago

Overall the idea of this thread is to refactor/resolve the hashing situation in a proper way.

Anton-Latukha commented 3 years ago

I started with #87.

It deduplicated some Hash<->Base encoding code and moves towards a form that is a bit easier to refactor.
Base encodings with their content probably should be the type/s.
And Base encoding<->Text encodeInBase/decodeBase functions - seem to be the function instance of the type class for Base encoding type/s - which would remove the currently introduced coupling and would allow to reduce the repetitions over the project code.
```
-- | A digest whose 'NamedAlgo' is not known at compile time.
data SomeNamedDigest = forall a . NamedAlgo a => SomeDigest (Digest a)
```
, this seems to be the Typeable. If the digest is declared as supported (NamedAlgo is a type class of HashAlgorithm, but BTW NamedAlgo does not support Trunkated HashAlgorithm constructor) but not known at compile time - maybe Typeable for HashAlgorithm is enough to shift that type detection to run-time, maybe together with a couple of functions.

Anton-Latukha commented 3 years ago

Further minimalistic Hash refactor https://github.com/haskell-nix/hnix-store/pull/93.

Anton-Latukha commented 3 years ago

Long story short.

The custom dependently typed interface spreaded its coupling over the whole HNix-Store-Core project. Which can be seen from how much modules which use hashing need to import from Hash module in terms of data types, type classes and use them as kinds:

Digest
SomeNamedDigest(..)
HashAlgorithm(..)
ValidAlgo(..)
NamedAlgo(..)

Switching from this interface and simplifying the hashing interface to more clean interface and library interface means ripping-out current implementation of hashing interface and touching other modules of the Core, mainly StorePath and Base32, and with that that Hash module needs to be split into Internal: SriHash, TruncatedHash, Hash, Base (encoding operations) modules.

Switching interface demands module method refactors, and since that is needed - it is hard to shunn away from other minor refactors in the code.

And so refactor is slipped into a huge uncommitted refactor: https://github.com/haskell-nix/hnix-store/compare/2021-01-22-02-hash-refactor.

And I'd preferred to not climb that far without safety procedures. I'd preferred to atomize the process. Which means redoing work and opening a ton of PRs.

layus commented 3 years ago

Not that huge a merge I think. What bothers me most is the code duplication in Uncycle.hs

@sorki is this okay to merge by your standards ?

Le 31 janvier 2021 15:29:20 GMT+01:00, Anton Latukha notifications@github.com a écrit :

Long story short.

The custom dependently typed interface spreaded its coupling over the whole HNix-Store-Core project. Which can be seen from how much modules which use hashing need to import from Hash module in terms of data types, type classes and use them as kinds:
Digest
HashAlgorithm(..)
ValidAlgo(..)
NamedAlgo(..)
SomeNamedDigest(..)
Switching from this interface and simplifying the hashing interface to more clean interface and library interface means ripping-out current implementation of hashing interface and touching almost every module of the Core and function implementations that process data formed by hashing or base encodings.

Switching interface demands module method refactors, and since that is needed - it is hard to sunn away from other minor refactors in the code.

And so refactor is slipped into a huge uncommitted refactor: https://github.com/haskell-nix/hnix-store/compare/2021-01-22-02-hash-refactor.

And I'd preferred to not climb that far without safety procedures. I'd preferred to atomize the process. Which means redoing work and opening a ton of PRs.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/haskell-nix/hnix-store/issues/92#issuecomment-770390872

Anton-Latukha commented 3 years ago

@layus

That branch is WIP, it has WIP comments and is WIP all over the place. I never said it even compiles, and in fact it does not compile :smile:.

I never proposed that branch to merge. I only linked to the diff as a brief example. There is no way on GitHub to show the diff between branches without the Create PR button, if you reacted to it.

Uncycle was just a dump over, when I tried to figureout how to uncycle Base encoding from Hash, I committed that because it is WIP branch and moved over to other things right away, because understood that code would uncycle automatically after the work is done.

In that branch I arrived into Base32 and there started to figuring-out do we "really need" to cast between Text <-> Text.Lazy <-> ByteString <-> String so much, or use Text.Lazy parser for ByteString, so started to move the pipeline to Text, before bogging down in Base32 byte magic.

There after a while code changes went out of source control, and so PR went out of hands. I went too far to merge that work, and last parts of work are not committed properly to trackback.

So decided to do over things properly use the branch as an example if I decide to look-up something, and start doing and shipping things over.

It is really a normal process.

Do Lambda calculus refactors.
Internal denotation of things.
Then maybe do some renames, document changes.
Then do Typed Lambda calculus refactors with renames, document changes.
Do module formation, document changes.

Something like that.

Anton-Latukha commented 3 years ago

I opened a draft (#133) that submits the basic cosmetic and Lamda calculus refactors.

Anton-Latukha commented 3 years ago

It is gradual, because we have really important questions to discuss. Like, to what text type we standardize the default paths of a pipeline, those O (n) typecasts sum-up.

Anton-Latukha commented 3 years ago

Ok.

Currently think that I put too many words in this thread.

Overall currently question became triviallized in my head, lets again move it gradually as far as possible & tackle in many smaller parts & I would try to be too terse.

haskell-nix / hnix-store

Please, refactor the Hash #92