Closed Anton-Latukha closed 3 years ago
Overall the idea of this thread is to refactor/resolve the hashing situation in a proper way.
I started with #87.
It deduplicated some Hash<->Base encoding
code and moves towards a form that is a bit easier to refactor.
Base encodings with their content probably should be the type/s.
And Base encoding<->Text
encodeInBase/decodeBase
functions - seem to be the function instance of the type class for Base encoding type/s - which would remove the currently introduced coupling and would allow to reduce the repetitions over the project code.
-- | A digest whose 'NamedAlgo' is not known at compile time.
data SomeNamedDigest = forall a . NamedAlgo a => SomeDigest (Digest a)
, this seems to be the Typeable. If the digest is declared as supported (NamedAlgo
is a type class of HashAlgorithm
, but BTW NamedAlgo
does not support Trunkated HashAlgorithm constructor) but not known at compile time - maybe Typeable
for HashAlgorithm
is enough to shift that type detection to run-time, maybe together with a couple of functions.
Further minimalistic Hash
refactor https://github.com/haskell-nix/hnix-store/pull/93.
Long story short.
The custom dependently typed interface spreaded its coupling over the whole HNix-Store-Core project. Which can be seen from how much modules which use hashing need to import from Hash module in terms of data types, type classes and use them as kinds:
Digest
SomeNamedDigest(..)
HashAlgorithm(..)
ValidAlgo(..)
NamedAlgo(..)
Switching from this interface and simplifying the hashing interface to more clean interface and library interface means ripping-out current implementation of hashing interface and touching other modules of the Core, mainly StorePath
and Base32
, and with that that Hash
module needs to be split into Internal
: SriHash
, TruncatedHash
, Hash
,
Base
(encoding operations) modules.
Switching interface demands module method refactors, and since that is needed - it is hard to shunn away from other minor refactors in the code.
And so refactor is slipped into a huge uncommitted refactor: https://github.com/haskell-nix/hnix-store/compare/2021-01-22-02-hash-refactor.
And I'd preferred to not climb that far without safety procedures. I'd preferred to atomize the process. Which means redoing work and opening a ton of PRs.
Not that huge a merge I think. What bothers me most is the code duplication in Uncycle.hs
@sorki is this okay to merge by your standards ?
Le 31 janvier 2021 15:29:20 GMT+01:00, Anton Latukha notifications@github.com a écrit :
Long story short.
The custom dependently typed interface spreaded its coupling over the whole HNix-Store-Core project. Which can be seen from how much modules which use hashing need to import from Hash module in terms of data types, type classes and use them as kinds:
Digest HashAlgorithm(..) ValidAlgo(..) NamedAlgo(..) SomeNamedDigest(..)
Switching from this interface and simplifying the hashing interface to more clean interface and library interface means ripping-out current implementation of hashing interface and touching almost every module of the Core and function implementations that process data formed by hashing or base encodings.
Switching interface demands module method refactors, and since that is needed - it is hard to sunn away from other minor refactors in the code.
And so refactor is slipped into a huge uncommitted refactor: https://github.com/haskell-nix/hnix-store/compare/2021-01-22-02-hash-refactor.
And I'd preferred to not climb that far without safety procedures. I'd preferred to atomize the process. Which means redoing work and opening a ton of PRs.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/haskell-nix/hnix-store/issues/92#issuecomment-770390872
@layus
That branch is WIP, it has WIP comments and is WIP all over the place. I never said it even compiles, and in fact it does not compile :smile:.
I never proposed that branch to merge. I only linked to the diff as a brief example. There is no way on GitHub to show the diff between branches without the Create PR
button, if you reacted to it.
Uncycle
was just a dump over, when I tried to figureout how to uncycle Base encoding from Hash, I committed that because it is WIP branch and moved over to other things right away, because understood that code would uncycle automatically after the work is done.
In that branch I arrived into Base32 and there started to figuring-out do we "really need" to cast between Text <-> Text.Lazy <-> ByteString <-> String so much, or use Text.Lazy parser for ByteString, so started to move the pipeline to Text, before bogging down in Base32 byte magic.
There after a while code changes went out of source control, and so PR went out of hands. I went too far to merge that work, and last parts of work are not committed properly to trackback.
So decided to do over things properly use the branch as an example if I decide to look-up something, and start doing and shipping things over.
It is really a normal process.
Something like that.
I opened a draft (#133) that submits the basic cosmetic and Lamda calculus refactors.
It is gradual, because we have really important questions to discuss. Like, to what text type we standardize the default paths of a pipeline, those O (n)
typecasts sum-up.
Ok.
Currently think that I put too many words in this thread.
Overall currently question became triviallized in my head, lets again move it gradually as far as possible & tackle in many smaller parts & I would try to be too terse.
This report is for discussing the hash code refactoring.
All ideas and collaborations are welcome.
Current hash use in projects
Cryptohash history:
`vincenthz`, author of `cryptohash` created a `cryptonite`, and eventually deprecated the `cryptohash` 5 years ago and declared that `cryptonite` superseded it: https://github.com/vincenthz/hs-cryptohash#readme. At the time of writing of it the `cryptonite` only was in the initial development and matured in 2016. HVR forked `cryptohash` and split it into a set of `cryptohash-*` packages. In 2016. And the same year (2016) the maintainer commits stopped in the `cryptohash-sha512`, and for the most used `cryptohash-sha256` - in the 2017. HNix-Store switched from `cryptonite` to HVR forks 2.5 years ago, in 2018. HVR forks become fully unmaintained this year (2020). Actually the last commit to `cryptohash-sha512` was 2018-03-18, is 2.75 years ago (at the time of writing), right around the time the HNix-store switched to it. Which in time created a direct problem for the HNix-store, currently with `cryptohash-sha512`. The team attended to solving upstream issues in advance actively, the reports & PRs to provide `base 4.14` is waiting there for 10 months. HVR is probably busy, his activity on GitHub this year is small. `haskell-hvr` group from the commit merges seems to have 1 person, because nobody other merges the changes in the projects. People pinged him. To reach-out, wrote to the `hvr@gnu.org` - and got no response. I lay this out just as the case that `cryptohash-*` essentially just bitrots. There is no reason to hold to it, except if the family of the packages would be maintained actively and readopted, most probably by `haskell-nix` as the most active around the stale projects. --- Inside the HNix-Store: #4 #14 #18 #25 #28 #27 #38 #83 #90