snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

Improving code readability and robustness #1132

Closed teknico closed 7 years ago

teknico commented 7 years ago

The code currently relies heavily on the global namespace, with quite a few names appearing out of nowhere and making it hard to follow module inter-dependencies.

There's been previous discussion in #734, where @lukego said:

Note: The core modules are already in the global namespace today, e.g. packet, but people don't use them that way for fear of expensive lookups when referencing the module on every call.

Actually, the code is peppered with names like config, engine, link and others, gotten from the global namespace. Each module accesses the global namespace through the ubiquitous boilerplate:

module(..., package.seeall)

This approach to modularization has been deprecated in Lua 5.2 and is not supported anymore. Its problems are well known. There are alternatives which are slightly more verbose, but more readable and robust.

On the other hand, the various require statements at the top of each file are useful and explicit, as argued by @lukego.

Defining a local scope, distinct from the one visible from outside, would also help making a clearer separation of private and public names.

But I'd like to tackle the namespacing problem separately from the public API and package layout issue: it seems an easier approach, and more likely to happen.

lukego commented 7 years ago

@teknico Good topic! Seems like we should have enough experience to make informed choices nowadays. What is your proposal exactly, and what pain points will it relieve?

teknico commented 7 years ago

The pain points are two:

1) encapsulation leaks and risks of unwanted side-effects among modules, due to reliance on an opaque and crowded global namespace; 2) low clarity of where names in the global namespace come from, which impacts readability.

See the resources linked above for details.

I'd like to confine usage of the global namespace to library names, and avoid adding to it. To help detect unwanted usage, a tool like luacheck will be useful.

I propose two phases:

1) remove access to names in the global namespace like config, engine, link etc., as done in Igalia PR 777 and Igalia PR 778;

2) remove addition of names to the global namespace by defining a table for each module where to put the public names it defines (again, see the linked resources for details). The require statements at the top of each module will then get the names other modules export from their respective tables rather than from the global namespace.

lukego commented 7 years ago

I feel pain point 2. It can be very surprising how names propagate e.g. when a module uses packet.seeall to import all the global names, and then the module is used as a class-like metatable, and so each "instance" object ends up inheriting and exporting all global names as if they were its own fields. So I am on board with tightening things up.

I am not so sure about point 1. Lua puts a few pervasive modules into the global namespace, like table and string, because otherwise they would need to be imported all the time. Snabb does the same with a few modules like packet and link. Is this unreasonable?

Snabb has a tradition of treating lines of code as a scarce resource. We strive to implement as much functionality as we can using as few lines as possible ("less is more.") I feel that adding a hundred lines that all say local packet = require("core.packet") to the code base would be an about-turn where we decide that lines of code are cheap. This makes me uncomfortable, and I would prefer to find a way to solve our problems while reducing lines of code.

One sub-issue about local for subroutines. If we make all of our subroutines local then we need to declare them before they are called. This could require reorganizing many source files. There are a few alternatives: we could reorder code such that the lower-level functions always appear before their higher-level callers (like in C and Forth); we could adopt an object-oriented style to indirect our way out of this restriction (mirroring fenv in a "class" table, essentially); we could "forward declare" the subroutines with added local declarations at the top of the file; etc. Have you thought about this issue already and do you have a proposal for what we should do?

One more issue that is perhaps more on the topic of the API. I would really like a coding style that makes new users' first steps pleasant. I am not comfortable with every new Snabb hacker always having to type local packet = require("core.packet") as their first step in the project. I would be much happier if their first line was function do_my_awesome_thing_with_packet (p) for example. Do you know what I mean?

I am also a bit confused by this:

remove addition of names to the global namespace by defining a table for each module where to put the public names it defines

I feel like there is a mixup here. Writing function foo () ... end will not add a name to the global namespace. Rather, it will add a name to your file-local namespace that was created by module() and is accessible via getfenv(). So I don't feel that we have a problem with polluting the global namespace today, with the exception of the names explicitly installed in the core.main module at startup.

lukego commented 7 years ago

Just thinking about this a bit more...

Goal is to find a balance between qualities like aesthetics, robustness, clarity, brevity, and so on. On the one hand we can take a holistic approach, considering a whole programming style at once, or on the other hand we can take a piecemeal approach considering one aspect at a time: Should we use module? Should we use package.seeall? Should we use object:method syntax? etc.

My concern about the piecemeal style is hill climbing to a local maximum. Can be that we use global names to improve brevity, great, but that this makes the code more ambiguous, bummer. Or can be that we use very explicit naming conventions to reduce ambiguity, great, but that this increases the ratio of boilerplate to actual code, bummer. So the risk is that we end up with a solution that nobody really likes through a series of steps that each seem reasonable in isolation.

So I am drawn to the holistic approach. For example, we could propose rewrites of the sprayer.lua example program and consider the virtue of adopting one of the candidate programming styles as the canonical default for Snabb code. This way we would be looking for a global maximum that balances the parameters in a suitable way.

Here is a quick fantasy idea along those lines. My idea de jour is to introduce a separate namespace for modules and variables such that you can see the difference lexically. Module names would be Capitalized and automatically loaded with require behind the scenes (looking first in the current directory and then at the top-level.) Here's the revised sprayer.lua written in fantasy style with reckless disregard for existing practice :)

-- sprayer.lua
module(..., package.seeall)

function new ()
   local count = 0
   local input, output

   -- Expect one input and one output link.
   local function relink (inputs, outputs)
      input  = assert((#inputs == 1)  and inputs[1], "one input link required")
      output = assert((#outputs == 1) and outputs[1], "one output link required")
   end

   -- Drop every second packet.
   local function push ()
     for packet in Link.receive_all(input) do
        if count % 2 == 0 then
           Link.transmit(output, packet)
        else
           Packet.free(p)
        end
        count = count + 1
     done
   end

   -- Make an app instance object
   return { relink = relink, push = push }
end

Some suggested virtues:

(Overall this style is quite similar to Erlang with a lexical distinction between modules and variables, minimizing indirection in module references, and providing at least some properties that make life easier for creating development tools.)

Whaddayareckon?

lukego commented 7 years ago

One more wrinkle.

I have made a long investigation (see whole thread around https://github.com/LuaJIT/LuaJIT/issues/248#issuecomment-273841796) into the overhead of different function calling styles. Refer to a module as a global? Cache the module in a local? Cache the individual functions in locals?

I believe there is an interesting optimization that could be conveniently applied to the Capitalized module loading machinery that I sketched above.

The short version is that we can optimize module lookups by wrapping the module with an FFI object. The FFI object will use the module as its metatype and so all of the functions and variables from the module will be visible. The benefit is that LuaJIT semantics allow the compiler to treat metatypes as immutable. In practice this means that the JIT will perform the table lookups at recording time. The machine code will then contain direct pointers to the table values (e.g. Lua function objects) rather than performing runtime lookups.

So when we first see a reference to a Capitalized module, such as Packet, we could do something like this to put an optimized version of the module into the global namespace:

-- m is an ordinary Lua module
local m = require('Packet')
-- t is a unique FFI type that forwards to module m
local t = ffi.metatype(ffi.typeof("struct {}"),
                       {__index = m})
-- o is an instance that forwards to module m but with optimized lookups
local o = ffi.new(t)
-- Global name will point to the FFI object where lookups are optimized
_G.Packet = o

This would potentially allow us to write things like Bit.xor(...) with the benefit of compiler optimizations that usually require caching the function in a local.

The caveat is that it would complicate life if we want to update modules. If we replace a function in a module then the old one can still be referenced from machine code in traces. So we would need to either not do that, or to flush the JIT in some appropriate way to regenerate the mcode.

teknico commented 7 years ago

The lookup optimization using FFI objects seems quite promising, and the sprayer.lua rewrite, with the lexical convention for modules, does increase encapsulation and localization.

However, I'm wary of automatic module loading and of increased reliance on the global namespace.

I disagree with your current self (and agree with a former one :-) ) that the require lines are boilerplate. They are an explicit indication of what external code the modules is referencing: without them, I fear readability will decrease.

As you indicated previously, at the moment seven names are added to the global namespace from a single spot in the code, the core.main module (and that's not currently documented, AFAICS). With this proposal, IIUC, all intermodule references would be implicitly injected in the global namespace.

If that's going to be the outcome of this discussion, I fear I'll regret (re)starting it. Is there any way to get lookup optimizations and code encapsulation without harming readability?

lukego commented 7 years ago

Is it reasonable to classify this discussion as subjective?

You feel that extending the namespace for free variables in a module harms readability, while I feel that voluminous require() lines harms readability. So the answer to this question:

Is there any way to get lookup optimizations and code encapsulation without harming readability?

Is "that depends on who you ask." 😄

There are also secondary issues that may be significant. I have found it a problem in practice to use the same lexical conventions for module names and variables, e.g. I have often written local packet = ... which shadows the packet module and directly leads to bugs. So in that sense I feel that making separate lexical conventions for modules and variables, Packet and packet, is effectively splitting up the namespace where I suffer the most conflicts in practice. (Using a linter may also address this problem in a different way.)

I should also stress that I am not married to the Capitalized modules idea. However, in my mind this discussion is framed in holistic terms, "How should Snabb code look?", and that these details are not amenable to piecemeal changes (especially when they require big global changes to the code base.)

lukego commented 7 years ago

I also reflect that Lua ordinarily has a certain amount of "magic" when it comes to loading modules. If you write require("foo") the normal expectation is that Lua will search multiple directories to find .../foo.lua and also check for other patterns like .../foo/init.lua. (See package.path.)

The historical reason that we don't have this in Snabb, and always use fully qualified names within a single directory structure, is that LuaJIT does not support package.path for modules that are compiled into the executable via the C linker.

I have never been satisfied with this situation. I don't think the line local basic_apps = require("apps.basic.basic_apps") deserves to exist in any program, let alone our beautiful little Snabb :). It's one step away from SimpleBeanFactoryAwareAspectInstanceFactory. So I do think that our code could benefit from the introduction of some new naming shortcuts, even if the ones we already have today may have been poorly chosen.

teknico commented 7 years ago

Is it reasonable to classify this discussion as subjective?

I guess it is, and enough has been said.

As mentioned, I like the capitalization idea for modules. It reminds me of the Go convention for private/public names, even though different.

lukego commented 7 years ago

I guess it is, and enough has been said.

So we can close the present discussion. However, I consider this topic to be both subjective and important. Somebody needs to take this bull by the horns and create the idiomatic Snabb programming style. But that is a big messy task and I understand that it is not what you want to pursue here :).