haskell-nix / hnix

A Haskell re-implementation of the Nix expression language
https://hackage.haskell.org/package/hnix
BSD 3-Clause "New" or "Revised" License
752 stars 115 forks source link

Please, apply YAGNI to the own implementation #879

Open Anton-Latukha opened 3 years ago

Anton-Latukha commented 3 years ago

First, lets say that implementing the project was very difficult task, and all my ramblings are not critique and not any detriment to the work done. I just discuss cleaning-up things after the creation stage. As my petty experience says - refactoring really in real world practice indeed takes 50% of the time of the coding. Frequently I make things work in 2.5 days, and then I clean-it-up 2.5 days until I am happy with the implementation.

Lets joke a little about our current state:

picture

![](https://www.monkeyuser.com/assets/images/2019/123-yagni.png)

It currently depicts the HNix engine implementation to what CLI currently does.

And the engine does not have a drag, it almost turns the blades.

The more work on the code polish done, the more YAGNI in the engine reveals themselves. The faster the engine turns, so - maybe we would be able to build a whole-lot-of-a-big-deal-plain on it that helps a lot of people.


The YAGNI needs to be reduced from the main computational paths and the main core of the engine, and in many cases, to save what additional functionality is already provided - we can easily preserve the current YAGNIs as such, optional, as in https://github.com/haskell-nix/hnix/issues/864.

Good example of them, is Kleisli actions that were drugged around in functions when the function of those functions was completely other thing.

The https://github.com/haskell-nix/hnix/issues/864 and https://github.com/haskell-nix/hnix/issues/850 results are gargantuan, code inside the functions and around them became elegant, easy to use and understand, and moreover - more efficient, and allowed and allows more and more elegance.

Or in Pretty printing the outside package module function was used to place in its place a literal as '.' or ' ', so with those function stacks the implementation and visual representation of the result were not understandable.

After the literals became literal - the code became much more straightforward to understand and work with.


For example: the queryM initially was implemented as:

queryM :: MonadVar m => NThunkF m v -> m a -> (v -> m a) -> m a
queryM (Thunk _ active ref) n k = do
  nowActive <- atomicModifyVar active (True, )
  if nowActive
    then n
    else do
      eres <- readVar ref
      res  <- case eres of
        Computed v -> k v
        _          -> n
      _ <- atomicModifyVar active (False, )
      return res

There is a lot of mystery around it. querryM is quite a mysterious function. Was for me for the last 2 years of learning the project. I looked at it a couple of times and was not sure what its purpose is. Its type signature and how to use it, why it is so and what this function details of the purpuse are a mystery.

As Unkle Bob says:

picture

![programming-WTFs-per-minute](https://user-images.githubusercontent.com/20933385/110820314-a54ea780-8297-11eb-97eb-6ad782082b1c.png)

Mystery "was", because, until refactoring happened. Noting in advance: ¿ Why passing 2 arguments that play no role in what function does? Since, function in itself to read the structure, and return a value if it is computed. ¿ Why blocking a thread and mutating a monad twice for just reading value? Reading of a structure does not change monads of it, reading in a language with static data structures does not need locking - we can allow people to read static structure versions at any moment as much as people want.

Refactoring qeryM reveals its lean implementation nature:

After the thoughtful polish - it is really all there is to its current implementation:

queryM :: NThunkF m v -> m v
queryM (Thunk _ _ ref) =
  do
    (\case
        Computed v -> pure v
        _deferred -> mempty
      ) =<< readVar ref
  1. Read the reference.
  2. Computed? (Yes) -> return it. (No) -> if the function does not know the value - it as a good human responds "I know nothing about it.".

The action becomes cheap, and does it is needed to be explained that GHC would be able to optimize it strongly, it is so light that GHC probably frequently would even just inline it in the place of use completely freeing it from abstraction overhead, module call overhead, type class overhead, it is basically a monad-aware if then else statement, which GHC sees and going to embed it so.


Overall, since HNix in the most currently known ideal way most directly implements the most direct analog for the core language - the HNix core in itself from all signals can & should be very lean.

Because of the strong nature of the project the principle of:

If one way be better than another, that you may be sure is nature's way. (circa 350 BC, Aristotle)

The most strongly applies in this project.

Lets do things naturally, lets move currently mandatory YAGNIs that currently forced on our own implementation, these YAGNIs just hold the project back, let us move YAGNIs to what they are - the optional possibility we can provide additionally if that would be required, and allow main (and so - probably the canonical) lean implementation, design. This going to leverage the strong points of what HNix is & shine the genius of the implementation, and going to make us and the compiler really happy with the code.

If the initial agenda of the project is to be open, it is not enough to make the complex code open, since we really can make a majority of the code readable and understandable to all Haskellers, so anyone who comes around marvels, or we would have the ingenuity that is being elegant to the point of implementation being succinct, calm and humble.

layus commented 3 years ago

Thank you faor challenging this design decision. I too was puzzled by the incredible complexity of having to pass around the continuation to all the query, force and demand functions. I guess there was a reason at some point for doing that, but I am not too sure anymore.

As you are challenging the design, do you see a way to tune the type of values to ensure that they are already demand'ed ? If we are going to pass around simple values, we could as well disambiguate an (NValue f m) from and m (NValue f m) and ensure that the first one is at least WHNF with no thunks. It is weird that the type does not reveal that, and makes it a bit awkward to always demand values that are possibly already demanded.

Anton-Latukha commented 3 years ago

Well, I'm not challenging anything.

We just do design clean-up after the initial creation phase.

It can sound strange, but so far I did not really changed anything, it was just polishing the stuff. It is seen from ChangeLog entries. The changes introduced so far are just cleaning-up of the design.

I'm so far still learning how it all works together.

About changing how the central structure operates, it is changing the force & demand and logic around it - I still need to learn some more to understand the big picture, for example, I need to know how GHC memory operates on structures in such cases of use, get some knowledge on how both laziness, non-strictness and strictness operate on the structure, to got the feel and thought about it. I think the memory-eating beast is hidden in that place. Maybe memory Charybdis is just a side effect of strict pattern matches on lazy structures. But, overall not distinguishing between raw thunks, WHNFs and evaluated expression - currenly seems proper to me, it is parametric polymorphism and design towards maximal parallelism, main way of how Haskell does stuff. This design allows having and working with the main logic and leverage GHC to help, without thinking about all the cases in which the processing instructions are run. What once WHNF pattern matched, very probably that layer result wad saved and reused in memory. If we to distinguish between the thunks, WHNFs and evaluated - since types would be different - it very well probably blows-up the code design (functions per type, cases per state), and then we would need to think and work about what to do with what structure case in case of what order of execution ... in a lazy language, which means code design allows to run everything at any time, and we would need to support that with implementation cases.

And currently, I'm thinking on the problem in terms of HNix core to the pure language just being a propagator.

If there can be a direct analogy, HNix is currently still largely slow-sequential-one-threaded. You remember the main initial difference between SysV Init and systemd, it can be said that shifting sequentially to propagator there increased the performance tenfold, giving the same, if not to say - more robust guarantees.

Nix, pure, means referentially transparent. Language being "referentially transparent" gives propagator for free. This means if language semantics are processed right (which is just the most basic requirement for the language parser, which HNix already did) - since the end result would always be the same - as long as the actions are semantically right, the order in which HNix does thing is completely irrelevant - they all propagate into the same result.

This means Nix language is a lattice of actions to do to reach the peak of the lattice - that the peak is reached - means the Nix expression was sound and computed to the end result - it is cases we strive for, in real-world 99.9 the sound Nix code means people wanting code terminates. Since Nix paradigms - expressions can ask to run non-terminating processes, and that is just a "halting problem", so the only thing we can do - so even basic sound Nix implementation anyway needs to have rules where to stop the very long-winded computations and consider them correspond to the halting problem. HNix currently marks things opague after the normalization process stacks 30-60 layers, something like that.

So HNix looks very much like a propagator on lattices of particular Nix language expressions.

Propagators have stages to solving the computational problem, so we naturally already have them - the first stage is parsing, next is evaluating, next is normalizing, next is executing substage that reified results into the store storage... The only stages that is not a pure propagator - are the higher-level stages, where HNix does the environment-changing imperative stuff happening, which starts inside derivation creation - their actions have a definite particular order of execution, but even there are a lot of Nix derivation building substages, some of which may be pure propagators, and some are definitely imperative sequences. Even the NixOS new generation creation and switch to are largely of a propagator nature, because systemd implemented as one, for it the system state to configure is a lattice and there are targets to arrive to.

Since HNix design currently is thunk-blocking-one-threaded, it is a great time to understand if there is really 1 central thunk (that is why I talked about GHC memory management understanding, and understanding the broad picture).

So in the current project position moving the design from thunk-blocking-one-threaded into non-blocking parallelism direction by taking it with courage and take its results for the world to show the path to the further design steps to naturality - seems logical. Well, since as pointed-out, HNix core seems to be actually a propagator on Nix language, gradually moving things into non-blocking parallelism would show wherein the current implementation the propagator lattice does not commute, commuting the lattice brings naturality (in categorical terms), that would give HNix, Nix and Haskell way to guide us toward well, natural (word), design, to give naturality (term) guarantees on moving "functor of Nix expressions" into "doing actions" (which gut to language paradigms and our current implementation would yield (connect back with) the referential transparency and parametric polymorphism of the code on the Nix expressions), and since end goal in fact is already achieved - are Nix referential transparency is already ensured, and referential transparency guarantees naturality - so this whole move just allowing HNix to find the most elegant design for itself, and our project not executing stuff - that move, in reality going to be quite safe, especially with a lot of Haskell and Nix guard rails. But the one thing we need to do before that - is to be ready as much as possible.

During ongoing work, I more and more see the Obsidian work being the right direction, to simplify the design into non-blocking and simplify scoping. Current scoping seems indeed too heavy, and approaching the design development from the start of doing non-blocking parallelism seems natural.

If they do not show-up, in some time I plan to get their patch, study it, and implement it again to complete the work, there should be somebody that completes that work.

Anton-Latukha commented 3 years ago

But before all that - there are way simpler things.

We can talk about the high matter, but the project needs basic stuff done, and changing List to proper monoids and (String -> Text -> String -> Text)-actions to proper text processing - do not really change in the design, it is just doing the sanitary basics of the clean-up. We can hear everyone from the rooftops are shouting to everyone do those sanitary basics. And that simple sanitary guarantees changing those computation chains from O(n^c)s to at least O(n*log(n)), but actually promises O(n) and even O(1) at times.

So far we do gradual evolution, since nature is strong in the project - natural gradual evolution is a strong navigator, the more gradual progress happens the more the design reveals itself. The further project does gradual evolution I see we would know more and more what to actually do as next steps, I do not hold current beliefs, because know that going to have a better idea when the time would arrive.

Anton-Latukha commented 3 years ago

If tractated something wrong, please be honest to correct me, if it is so - would retract and reorganize thought processes.

Anton-Latukha commented 2 years ago

Currently the separate nix-linter is probably way better than basic implementation in Nix.Lint.