Some concerns from a researcher in this space

Hi, I do research about the cognitive factors of programming. I just completed my Ph.D. at Stanford involving studies of cognitive load in program comprehension, as detailed here: https://arxiv.org/abs/2101.06305

Thanks for putting together this document! You bring up many important points about what makes code easier and harder to read. I'm sure that programmers will learn something useful by reading this document and using its ideas to improve their own code. I appreciate that you keep the focus close to working memory — many people (including researchers!) will invoke "cognitive load" to just mean "a thing is hard to think about", rather than the specific meaning "a task requires a person to hold information in their working memory".

However, my concern with this document is that cognitive load is still a very specific cognitive phenomenon about the use of working memory under specific individual, task, and environmental conditions. We have essentially no experimental data about how to optimize programs to minimize cognitive load. But this document presents a lot of programmer folklore under the authority of "reducing cognitive load", which I worry presents a veneer of scientific-ness to a subject that has very little scientific backing. This document presents ideas that I suspect most developers intuitively agree with (composition > inheritance, too many microservices is bad), and then retroactively justifies these with via cognitive load. Readers get to think "ah good, there's science to back up my feelings," but there's no real science there!

Here's two examples from the document that I think misuse the concept of cognitive load.

"Inheritance nightmare"

Ohh, part of the functionality is in BaseController, let's have a look: 🧠+
Basic role mechanics got introduced in GuestController: 🧠++
Things got partially altered in UserController: 🧠+++
Finally we are here, AdminController, let's code stuff! 🧠++++ [..]

Prefer composition over inheritance. We won't go into detail - there's plenty of material out there.

What exactly is being held in a person's memory here? The contents of the function? The name of the class holding the content? The location of the class in the file? A visual representation of the inheritance hierarchy? The details matter! And are these details even held in working memory? Given the names, a person might be able to infer that UserController is a subclass of BaseController, and not need to store that fact in WM.

It sounds like the issue actually being described here is not inheritance, but rather abstraction -- code is separated over a number of functions and modules, but sometimes a person needs to make a cross-cutting change that involves knowledge of all of those separate pieces of code. (This kind of problem is what tools like Code Bubbles try to solve.) There is a working memory story somewhere here, but it's not just about composition versus inheritance! Using this as a "composition is better than inheritance" parable is a misuse of cognitive load.

"Too many small methods, classes or modules"

Mantras like "methods should be shorter than 15 lines of code" or "classes should be small" turned out to be somewhat wrong. [...]

Having too many shallow modules can make it difficult understand the project. Not only do we have to keep in mind each module responsibilities, but also all their interactions. To understand the purpose of a shallow module, we first need to look at the functionality of all the related modules. 🤯

I think this example is just way too abstract to be useful. For example, in theory "shallow modules" could actually reduce cognitive load. If a person internalizes each module, then that module could be a single chunk in working memory. For instance, consider two Rust implementations of a function that computes the minimum of the inverse of a vector of numbers:

fn min_inverse_1(v: Vec<i32>) -> Option<f32> {
  let mut min = None;
  for x in v {
    if x == 0 { 
      continue;
    }
    let n = 1. / (x as f32);
    match min {
      None => min = Some(n),
      Some(n2) => if n < n2 {
        min = Some(n);
      }
    }    
  }
  min
}

fn min_inverse_2(v: Vec<i32>) -> Option<f32> {
  v.into_iter()
    .filter(|&x| x != 0)
    .map(|x| 1. / (x as f32))
    .reduce(f32::min)
}

min_inverse_2 relies on a system of shallow modules (in a sense). A person reading min_inverse_2 has to understand what into_iter, filter, map, and reduce all mean. The person reading min_inverse_1 only needs to understand the basic features of the language.

However, the fact that min_inverse_2 relies on many external interfaces is not a problem if a person has internalized those definitions. In fact, it is probably easier to see at a glance what its behavior is, and to verify whether it is implemented correctly. Again, that's why I emphasize that cognitive load is heavily dependent on not just the structure of the code, but also the programmer's knowledge and the tools they use.

One other thing... saying that UNIX I/O is "easy to use due to its simple interface" is a very contestable claim. A user of read has to be aware of the entire function specification, which is actually pretty complicated: https://pubs.opengroup.org/onlinepubs/009604599/functions/read.html

In sum...

I would strongly encourage you to consider renaming this the "Code Complexity Handbook", or a comparable title. There is good advice here that reflects the knowledge of experienced software engineers. But the science of working memory applied to programming is far too young to justify the kinds of claims made here.

Yeah, it's too young, not enough data, and as a computer scientist I feel what you feel.

There's some parts that resembles a lot books like Power of Now and similar self-help books (that didn't helped me as well in the past).

Nonetheless I feel this kind of issue today, and I'm a witness of several fellow software engineers having troubles with similar stuff. So there is value in the OG article, as well on your follow-up here.

For instance I use the rule of thumb of smaller functions, even without data backing me, that is kind of refuted on OP article. I will continue to use a smaller private functions with longer names to box their meaning all day long over multiple-ten-line-for-loops.

I'm a beginner rust dev (sr in other stuff tho), and I can grok the second example much more easily because it's stupidly easier to ignore into_inter while I check the filter/map/reduce arguments that I'm used to use in Ruby, Haskell, instead of 10s lines of for loops and quickly decide which of the two versions will confidently solve my issue, or which one I can become a confident user.

Then I can use vim's gd (go to definition) or intellisense to understand that into something it's just a kind of type coercion or conversion. After some basic chapters from Exercism rust exercises and initial chapters it's kind of consistent for me that into_* is similar to Ruby's to_*, and that there's not a lot of magic between Vec and how to iterate them, so I need to internalize that I'm the one managing iters.

On the same vein I'm having a harder time learning golang than rust because of the longer code that is presented on books and non expert materials (I simply was not using for loops for a long long time).

All that said I liked your final suggestion better than the catchy "cognitive load". But code complexity has so many discussions and meanings that no one can focus on the consequences, like overloading my brain power requiring me to rest a bit sooner than the usual.

Finally, I find heartening that there's studies like yours. I can relate to swapping the meaning of something abstract a lot, more than forgetting.

Thanks for opening this issue, reading more of your article as well.

@willcrichton Hi! Thanks for taking the time to write such a thorough review! It's really great that you did your PhD on a relevant topic, both your review and your research are highly valuable.

This article is not backed by some kind of scientific data due to the lack of such data. In fact, we had somewhat similar discussion here. TL;DR: "The brain is the most complex system in the world, and we are quite far from fully understanding how it works". Lots of the brain-related things in the article were radically simplified, we didn't even mention germane cognitive load, nor we introduced memory chunk concept. Things like our emotional well-being, stress levels, time constraints and other factors that strongly influence our cognitive load weren't addressed either. The reality is much more complex and vague than "okay this line of code increments your cognitive load by 1: 🧠+".

We wanted to make things as approachable as possible, there's even kind of funny pictures so people wouldn't think it is some kind of scientific research. It is backed by some subjective and empirical data from a lot of experts in the field of software development. Most of the things mentioned do indeed consume some of our mental capacity, but the exact amount and the contributing factors are highly debatable. But that's beyond the scope of this handbook.

This is a community-driven document, where people share their insights and real-world examples. We don't reference any scientific researches, to bake the statements given, because the overall field, as you said, is kind of immature. The reference above (about 4 facts) says that even though there are researches on the topic, the scientists are still debating, and the reality is somewhat unknown, so we will use simplified models. I feel like additional warning should be added here.

That's why I emphasize that cognitive load is heavily dependent on not just the structure of the code, but also the programmer's knowledge and the tools they use.

That's totally true, there are so many factors contributing to cognitive load, we weren't even trying to enumerate them all. We wanted to look at the influence of code structure in particular, because sometimes it really takes a lot of tangible mental effort. And it's so easy to make the structure of the code mentally taxing for others.

The main premise (I believe we should explicitly address this in the article to avoid confusion) was that both the author and following developers have somewhat similar core knowledge base (language, framework, tools, infrastructure, so on). In this case, the author has several options: 1) He can unintentionally introduce additional complexity (which would be converted to high extraneous cognitive load for the upcoming developers) 2) He can be more aware of the upcoming developers and not use subtle/exotic/advanced stuff

We have an author: A C++ developer, knows a lot about all possible undefined behaviours, knows that sometimes undefined behaviour may call a never-called function, knows that requires C1<T::type> || C2<T::type> is not the same as requires (C1<T::type> || C2<T::type>), knows every subtle detail of most recent language standard. Has 20 years of extensive C++ experience (that's a real person, actually).

On the other hand, we have regular C++ developers: They have quite solid core C++ knowledge, they wrote a lot of production-ready code. They are self-sufficient, able to deliver value on its own. Yet, they haven't been programming in C++ for 20 years, so some very subtle details are out of their experience.

So, if our author were to apply all his knowledge (path 1 from the above) to a seemingly simple task - future developers would inevitably suffer from extraneous cognitive load.

You might say it would be useful for upcoming developers to up their understanding of the language. The thing is, the language became so complex, that even that developer with 20+ years of extensive experience complained that he doesn't know the language well enough. New C++ standards have brought a lot of extraneous cognitive load to the scene. Often times companies use some subsets of those complex feature-rich languages, thus to constraint the complexity and cognitive load. The authors of Golang decided to avoid non-orthogonal features and complexity in the first place, making the language simpler to learn.

The example given is not so bad, because at least it was justified by the complexity of the language. But the reality is far worse - sometimes complexity is based purely on author's subjective understanding. I mean, we read a lot of articles and books about all sorts of techniques, we build code structures based on our perceived understanding, then other developers come along and they don't share our perceived reality. They would suffer from high extraneous cognitive load, because the way we present our code is purely our own private thing. And the more assumptions we make using our private unique knowledge base, the more cognitive load it would create from other developers.

There is a working memory story somewhere here, but it's not just about composition versus inheritance!

Indeed, it is not just about composition versus inheritance. And the inheritance is not the root cause of the cognitive load here. We can use composition to write code that has twice the cognitive load of code written using inheritance. It's just a concrete example where you can clearly feel that extraneous cognitive load. We faced similar misuse across many codebases, and the inheritance often came in pairs. But it's not that "use composition, because the cognitive load would be lower", it is "cases like this take a lot of our mental effort, so inheritance is quite a dangerous tool, use it carefully". People following inheritance are tempted to extract perceived similarities in a common place and reuse things, violating all sorts of core programming principles (like encapsulation, cohesion and coupling) and making the overall system fragile and harder to grasp. The exact details of the mental effort behind the example were not given indeed, rather we simplified it to the extent that "it is mentally taxing". Developers worked with legacy codebases felt bad when they saw that inheritance section, because they have been through code like that and they know how mentally hard it is to untangle that kind of tightly coupled mess.

Complexity versus cognitive load While I agree that "Cognitive Load Developer's Handbook" may not be the best name, I definitely don't want to use complexity term as much. This word is so vague and unclear. There are so many discussions about complexity, yet we don't have a common understanding of what complexity is. What is complexity? How is this piece of code is more complex than another? To what extent is it more complex? How to know if this piece of code is complex or not, it feels simple enough to me?

The moment we say "avoid complexity" people will run away, because they have heard it one too many times. Professor John K. Ousterhout discussed complexity in great detail. He provided really good practical examples (those UNIX and shallow module examples were taken directly from his book). He introduced different types of complexity, and he did it really well. Yet not every developer is aware of this, and even if they are, it takes time and effort to get familiar with the concept. If we were to rely on this term - the article itself would become complex (because you first have to understand what complexity is).

Why avoid complexity in the first place? Because as humans we have a limited mental capacity. If we spend a lot of mental effort thinking about some line of code, then it is complex, given that we are qualified enough and that we have lots of relevant schemas in our long-term memory. Complexity is important because of the cognitive load it creates, so why not to discuss the cognitive load in the first place? Even though there's no objective truth and much science behind, we can say that some things are more likely to be cognitively taxing than others.

If we spend a lot of mental effort thinking about some line of code, then it is complex, given that we are qualified enough and that we have lots of relevant schemas in our long-term memory. Complexity is important because of the cognitive load it creates, so why not to discuss the cognitive load in the first place? Even though there's no objective truth and much science behind, we can say that some things are more likely to be cognitively taxing than others.

I can understand this motivation. But I think the issue is that cognitive load is fundamentally a small-scale phenomenon. For instance, the point about "nested ifs" is, I think, a really great example of cognitive load — I have even considered running some experiments in that vein. It's difficult to remember the specific set of boolean facts at a given point in a complex conditional, or inside a nested conditional, and alternative conditional structures can avoid that same cognitive load.

But then the point about "tight coupling with a framework" is at a much larger scale. The main argument is that "we force all upcoming developers to learn that framework first", which is claimed to be a source of (extraneous?) cognitive load. But that's just not what cognitive load means! And it's also a totally falsifiable claim! For example, if I ask a random frontend web developer to learn React and then write a reactive UI, I strongly suspect that the resulting code will involve less cognitive load to comprehend on average than a comparable vanilla JS implementation.

In general, you have very strong opinions about software architecture, which is totally fine. Some of those opinions align with our understanding of cognitive load, which is great. But some opinions have absolutely no basis in the theory of working memory, and this document does not distinguish one kind of opinion from the other. As someone who studies this concept for a living, I genuinely believe that developers would benefit from understanding the extent to which a philosophy about software complexity can (or can't) be grounded in cognitive psychology.

Couldn't reply in time - was quite busy working.

But I think the issue is that cognitive load is fundamentally a small-scale phenomenon.

So you're implying that we don't experience high cognitive load when we're working with big-scale issues?

The main argument is that "we force all upcoming developers to learn that framework first", which is claimed to be a source of (extraneous?) cognitive load. But that's just not what cognitive load means!

I believe that "framework" and "language" are interchangeable in this context, so we can continue this line of thought using both terms. Some languages and frameworks are so vast that it is impossible to learn them completely, and there are some tricky parts which nobody remembers (or even understands). Every time we rely on those tricky things - we're embedding cognitive load in our artefact. I invite you to read these thoughts from an experienced engineer (which I have now included in the article):

I was looking at my RSS reader the other day and noticed that I have somewhat three hundred unread articles under the "C++" tag. I haven't read a single article about the language since last summer, and I feel great!

I've been using C++ for 20 years for now, that's almost two-thirds of my life. Most of my experience lies in dealing with the darkest corners of the language (such as undefined behaviours of all sorts). It's not a reusable experience, and it's kind of creepy to throw it all away now.

Like, can you imagine, requires C1<T::type> || C2<T::type> is not the same thing as requires (C1<T::type> || C2<T::type>).

You can't allocate space for a trivial type and just memcpy a set of bytes there without extra effort - that won't start the lifetime of an object. This was the case before C++20. It was fixed in C++20, but the cognitive load of the language has only increased.

Cognitive load is constantly growing, even though things got fixed. I should know what was fixed, when it was fixed, and what it was like before. I am a professional after all. Sure, C++ is good at legacy support, which also means that you will face that legacy. For example, last month a colleague of mine asked me about some behaviour in C++03. 🤯

There were 20 ways of initialization. Uniform initialization syntax has been added. Now we have 21 ways of initialization. By the way, does anyone remember the rules for selecting constructors from the initializer list? Something about implicit conversion with the least loss of information, but if the value is known statically, then... 🤯

This increased cognitive load is not caused by a business task at hand. It is not an intrinsic complexity of the domain. It is just there due to historical reasons (extraneous cognitive load).

I had to come up with some rules. Like, if that line of code is not as obvious and I have to remember the standard, I better not write it that way. The standard is somewhat 1500 pages long, by the way.

So you're implying that we don't experience high cognitive load when we're working with big-scale issues?

"Small-scale" means on a small time scale. When working on an issue for days or months, you experience cognitive load at any given moment due to the specific task at hand. But it is a category error to talk about, e.g., the cognitive load of learning a programming language. The term was designed for talking about the cognitive load of e.g., reading a specific passage of a book about a programming language, or working on a single tightly-scoped programming problem.

There were 20 ways of initialization. Uniform initialization syntax has been added. Now we have 21 ways of initialization. By the way, does anyone remember the rules for selecting constructors from the initializer list? Something about implicit conversion with the least loss of information, but if the value is known statically, then... 🤯

This increased cognitive load is not caused by a business task at hand. It is not an intrinsic complexity of the domain. It is just there due to historical reasons (extraneous cognitive load).

Look — the problem you're describing is totally real, and a serious problem faced by programmers every day. I don't want to come across as saying "these problems don't exist". They do! It's just that what you're describing is not cognitive load in the sense used by cognitive psychologists.

For example, consider the following argument. "There are 26 letters. When a person reads a word, they have to consider all 26 letters that could be the word. We could decrease cognitive load by getting rid of 'k' and replacing it everywhere with 'c', therefore only having 25 letters. The letter k is extraneous cognitive load."

This argument doesn't make sense because for experienced readers, the process of reading a word does not involve loading all 26 characters into your working memory, and then applying them to a text. Fluent reading happens at a level of cognition beneath conscious working memory. Fluent reading is primarily a function of long-term memory (combined with perceptual heuristics). For instance, fluent readers of Chinese can effortlessly recall hundreds or even thousands of characters.

It's fine to say that the profusion of initialization methods in C++ is accidental complexity (or extraneous complexity, if you want). But it does not make sense to call those methods extraneous cognitive load. If you want to talk about cognition, I think you should be careful and avoid coarsely generalizing highly-specialized terms into inappropriate situations.

I'll quote an introduction from Measuring the Cognitive Load of Software Developers paper:

Software developers perform tasks that demand their cognitive load constantly. In neurophysiology and educational psychology, the cognitive load term indicates the amount of load a material imposes on users’ mental capacity. In software engineering, the cognitive load term does not have a definition. However, it generally refers to users’ mental effort when reading software artifacts or cognitive processing tasks. For instance, comprehending code requires that developers spend mental effort while reading source code elements. Moreover, developers also apply cognitive load to mentally process the structures of the source code, on resolving arithmetic problems, and on interpreting different abstractions levels of software artifacts.

Developers also apply cognitive load on interpreting different abstractions levels of software artifacts.

The above paper itself mentions that in the computer science this term has a rather different meaning (more vague and sometimes misused).

This argument doesn't make sense because for experienced readers, the process of reading a word does not involve loading all 26 characters into your working memory, and then applying them to a text. Fluent reading happens at a level of cognition beneath conscious working memory. Fluent reading is primarily a function of long-term memory (combined with perceptual heuristics). For instance, fluent readers of Chinese can effortlessly recall hundreds or even thousands of characters.

Your example is irrelevant to the issue we're discussing. No matter how experienced a developer is, it is not possible to have all the models in his long-term memory beforehand. This is because there are no such models outside of the brain of an author. Even if a programming language is the same, most of the models embedded in the source code are unique, they just can't be in any other programmer's long-term memory. We, developers, build imaginary models on the premises only we currently posses, depending on our understanding, on our level of experience, on our obsession with fancy buzzwords, patterns and architectures (our own subjective interpritation). Then other developers come along and they don't have those same mental models we were basing our solutions on, and that's the issue. It's not that you learn 26 letters (master a programming language or framework) and you're good to go. No! You'll face abstract structures no one has ever seen, and you have to apply all your mental effort to understand them.

Again, you're right — developers experience cognitive load when reading programs. No one is disputing that claim.

The issue is that you're making very specific recommendations about how to structure code to reduce cognitive load. None of these claims have been experimentally demonstrated, and they do not appear to be simple generalizations of findings in the cognitive load literature. Even when CS researchers have tried to make straightforward applications of cognitive load to CS education, it often doesn't work as expected. This is because human cognition is just complicated, and working memory is a very simplistic theory that explains only the most rote aspects of an information processing task. (Hence, I brought up reading comprehension as example where working memory is insufficient to understand the task.)

Like, I agree with at least some of your conclusions. I prefer composition over inheritance when I write code. But I just don't think we can use working memory to justify that preference, at least not with the available evidence.

As someone who studies this concept for a living, I genuinely believe that developers would benefit from understanding the extent to which a philosophy about software complexity can (or can't) be grounded in cognitive psychology.

This is such a good example, or rather a good analogy, for an upcoming developer wanting to contribute to a code base that uses an unfamiliar framework.

Despite frameworks making subjective tradeoff decisions, let’s hypothetically assume some perfect framework made objectively better decisions. Learning, understanding, and internalizing this new framework could even be a universally “provable” long term benefit to the upcoming developer, beyond the current product and until the end of their career.

Yet this would still require having to learn it before being able to make meaningful contributions to the code that uses it.

Working in the field of complexity and cognitive psychology means one would have a far stronger grasp of its framework or mental model and how it could apply to software engineering. Doing research in the field, essentially by definition, requires internalizing complexities beyond others simply applying it to their work, as research is furthering that understanding for others.

I would suggest that there is a large gap between what is useful in practice versus theory when it comes to the extent one benefits from applying science to their craft. The existence of both universities and boot camps shows this.

On the other hand, it is the role of science (in so far as it relates to engineering) to help lead craftsmen to better biases, models, frameworks, understandings, and even abstractions by which we can accomplish more complex tasks.

There is an interesting duality here. The time to improve one’s craft is itself a tradeoff, immediate productivity vs potential. I won’t pretend to know where such a line should be drawn… if I did, advances in the field would surely prove me wrong, or at best only temporarily correct.

zakirullin / cognitive-load

Some concerns from a researcher in this space #22