General feedback on chapter 1-10

Dear team,

I read chapter 1-10 over the last couple of days and wanted to put out my thoughts here. I really enjoyed reading it and I think the book provides a really broad yet comprehensive overview of the field. However, at some points I felt there could be more of a tinyML touch. See the list below for my detailed notes. In bold I listed the items that I can contribute in terms of writing.

Chapter 2

Section 2.3.1: could include Rust maybe? It's gaining a lot of traction in the embedded space
Chapter 5
Chapter 5: (in my opinion), the workflow description should be specialized towards tinyML as well. Because now, it is focused on a 'generic' GPU-based AI workflow. As such, there is currently no notion of:
- Continual learning/online learning
- Deployment in case of wearables
- Sample collection at the edge
- etc.
- However, these things are explained in the 'Embedded AIOps' and 'On-Device Learning' chapters. Maybe good to refer forward to that?
  Chapter 7
After section 7.3 ("DeepDive into TensorFlow"), there should also be a "DeepDive into PyTorch" probably, with an overview of all the packages and more details of the framework
Section 7.5.4: should include meta-learning /continual learning?
Section 7.10.1: What does this "This has led to vertical (i.e. between abstraction levels) and horizontal (i.e. library-driven vs. compilation-driven approaches to tensor computation) boundaries, which hinder innovation for ML." mean? I couldn't immediately understand this
7.10.2: could include OpenAI Triton (https://github.com/openai/triton)
Chapter 8
Chapter 8: currently no mention of meta-training or training for continual/online learning, while it could be relevant for AI training related to embedded systems.
Chapter 9
Section 9.3: cite ResNeXt and ResNet-SE in the text, as now they are mentioned but not cited
Section 9.6.1 could add logarithmic/power-of-two weights
Chapter 9: I feel like it misses some section/notes on memory-limited processing for example this is the case for transformers where a large chunk of the processing time is spent on transferring data, and not per se on compute.
Section 9.7.1: should add energy metric, as it can sometimes be more important than power for embedded systems as you could compare them based on energy per inference or training with a sample.
Section 9.7.2: maybe add hyperlinks to COCO, ImageNet etc. datasets
Chapter 10
Section 10.2.1: I'd say that advantages of structured pruning (=hardware efficiency) also helps for GPUs (think of 2:4 sparsity on Nvidia GPUs), not only for FPGAs and ASICs
10.2.1 unstructured pruning ", brings with it challenges related to managing sparse representations and ensuring computational efficiency. " does not focus enough on the large issues accompanied with unstructured pruning
10.2.1 > "lottery ticket hypothesis": duplicate text of 'More formally, the lottery ticket hypothesis is a concept in deep learning that suggests that within a neural network, there exist sparse subnetworks (or "winning tickets") that, when initialized with the right weights, are capable of achieving high training convergence and inference performance on a given task.'
Maybe split chapter 10 into multiple chapters, because now it is really (scarily) long
10.2.2 > low-rank matrix factorization: it is not clear where these challenges ("A key challenge resides in managing the computational complexity inherent to LRMF, especially when grappling with high-dimensional and large-scale data.") arise from
- Duplicate text: "A key challenge resides in managing the computational complexity inherent to LRMF, especially when grappling with high-dimensional and large-scale data. The computational burden, particularly in the context of real-time applications and massive datasets, remains a significant hurdle for effectively using LRMF."
10.2.3 "Streamlining Model Architecture Search" should be called (in my opinion) "Neural Architecture Search (NAS)"
10.3.3 > "Computational Complexity": "For example, the figure below shows that integer addition is much more energy efficient than integer multiplication." ➝ Not clear from the text why this is the case. We can. explain that this is because a multiplication is just a stacked addition.
Mainly in chapter 10, I think we should choose either FP32 or Float32 for consistency across the whole text
10.3.4 intro text has some bold part, but nowhere else in the book (as far as I could see, is bold text used in this way).
10.3.3 Computational complexity:
- Figures: "Graph showing the speed differences for three different models in normal and quantized form."
- and: "Figure comparing the sizes of three models with their quantized forms"
- Need a reference
Chapter 10: make sure spelling of "quantization aware training" term is synchronized between all of the book. For example, we should either choose "quantization-aware training" or "quantization aware training". Also make sure that the first time this term is used, the abbreviation is added in brackets, because right now that only comes later (not at the first appearance).
10.3.5: "For example, if weights of a neural network layer are quantized to 8-bit integers (values between 0 and 255), a weight with a floating-point value of 0.56 might be mapped to an integer value of 143, assuming a linear mapping between the original and quantized scales." ➝ this example is not clear, it doesn't aid the understanding in my opinion.
10.3.5: "Due to its use of integer or fixed-point math pipelines, this form of quantization allows computation on the quantized domain without the need to dequantize beforehand." ➝ what does this mean? Also dequantization was never explained before at this point in the text, making it even harder to understand this
10.3.5: "Despite this, uniform quantization continues to be the current de-facto choice due to its simplicity and efficient mapping to hardware." ➝ despite what? Not clear to me from the context
10.3.5 > Non-uniform quantization: "Typically, a rule-based non-uniform quantization uses a logarithmic distribution of exponentially increasing steps and levels as opposed to linearly. " can cite 1 or 2 papers for this OR cite the already used 'Gholami et al. (2021)'
- "Another popular branch lies in binary-code-based quantization where real number vectors are quantized into binary vectors with a scaling factor" needs a reference: I cannot find something on the internet about how this method works, and also intuitively I don't understand it.
10.3.5 > Zero Shot Quantization ➝ right now a note is missing on the limitations of this method, because the way it is described now it seems amazing without any downsides
10.3.6 calibration: could show how alpha and beta relate to quantization function Q
10.3.6 broken reference: "Q-BERT [Q-BERT: Hessian based ultra low precision quantization of bert] for quantizing Transformer [Attention Is All You Need]"
Chapter 17
17.4 challenges: should probably also include object-centric issues

harvard-edge / cs249r_book

General feedback on chapter 1-10 #104

Chapter 2

Chapter 5

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 17