zackchase / mxnet-the-straight-dope

An interactive book on deep learning. Much easy, so MXNet. Wow. [Straight Dope is growing up] ---> Much of this content has been incorporated into the new Dive into Deep Learning Book available at https://d2l.ai/.
https://d2l.ai/
Apache License 2.0
2.56k stars 725 forks source link

Detailed feedback for Ch 01 #201

Closed simoncorstonoliver closed 7 years ago

simoncorstonoliver commented 7 years ago

Notes on chapter01_crashcourse/introduction.ipynb

Preface: • Tone is flippant • Vocabulary level too high for ESL readers e.g. “cognizant”, “buffoonery” • Eliminate aspirational / modest statements • Entire preface could be reduced to “mxnet-the-straight-dope is an educational resource for deep learning that leverages the strengths of Jupyter notebooks to present prose, graphics, equations, and (importantly) code together in one place. The result will be a resource that could be simultaneously a book, course material, a prop for live tutorials, and a resource for plagiarising (with our blessing) useful code.”

Learning by doing – who is “I”? Rest of intro uses authorial “we”

Introduction • Inappropriate vocabulary level: fabricated, pedagogical • we ourselves are nonetheless capable of performing the cognitive feat ourselves. • Saying that you turn knobs is usually a reference to hyperparameter tuning, not parameter setting • Dysfluent: Generally, our model is just a machine transforms its input into some output. • Typo: English langauge • Acronym ML is used without being defined • “sucks less” – rephrase • “model is dope” rephrase • dysfluent: They're mostly because they are problems where coding we wouldn't be able program their behavior directly in code, but we can program with data • Oftentimes >> Often • Dysfluent: To get going with at machine learning • Rephrase: Generally, the more data we have, the easier our job as modelers. • Structured data: I would not call a Jupyter notebook structured data. It’s unstructured but marked up • Typos: ingesting high resolution image deep neural networks • deep neural networks >> deep artificial neural networks • Models section: bulleted section beginning “loss functions” appears with no connection to running text. • Loss functions: AMZN stock prediction is one example of a loss function • Training section: “the latter” – the latter what? There are not two antecedents • Trained error: italicized f is used without introduction. • Incomplete sentence: “Encouraging but by no means a guarantee.” • Rephrase: “This can be off by quite a bit (statisticians call this overfitting).” The point to make is that the error on test data can be greater that the error on the training data. • “one aims to do” – tone difference from colloquial “you” throughout • Supervised learning: too many terms used without introduction: x, y, targets, inputs • Incomplete sentence “Predict cancer vs not cancer, given a CT image.” • “Perhaps the simplest supervised learning task wrap your head around in regression”. I think predicting labels is much simpler. • Term vector should have been introduced much earlier • Typo: whacky. What purpose is served by introducing notation? • “Lots of practical problems are well described regression problems.” “Lots of practical problems can be formulated as regression problems” • dysfluent: Imagine, for example assume • Eliminate discussion of L1 loss – way too much detail for the place where we’re describing the kinds of learning algorithms • Fix: In classification, we want to look at a feature vector and then say which among a set of categories categories (formally called classes) an example blongs to. • Paragraph starting “more formally”. Mangled text. Unnecessary math symbols and terminology • Death cap example: eliminate math • Extensive spelling errors • dysfluent: But not matter accurate • “This problem emerges in the biomedical literature where correctly taggin articles is important because it allows researchers to do exhaustive reviews of the literature.” It doesn’t emerge there. Applies there perhaps? • “A possible solution to this problem is to score every element in the set of possible sets with a relevance score and then retrieve the top-rated elements.” >> “A possible solution to this problem is to score every element in the set of possible sets with a relevance score and then display retrieve the top-rated elements.” • Recommender systems: “Generally, such systems strive to…” Eliminate math symbols or at least fix the funky rendering – it looks like a superscript u for user • “So far we've looked at problems where we have some fixed number of inputs and produce a fixed number of outputs. Take some features of a home (square footage, number of bedrooms, number of bathrooms, walking time to downtown), and predict its value. Take an image (of fixed dimension) and produce a vector of probabilities (for a fixed number of classes). Take a user ID and an product ID and predict a star rating. And once we feed our fixed-length input into the model to generate an output, the model immediately forgets what it just saw.” o A common idiom in the preceding text is “Take X for example” so I initially garden-pathed on these examples. One example is sufficient, preceded by “for example”. o The preceding text did not stipulate that the input vector is fixed length. Nor did it stipulate that the labels are a fixed set. • Automatic speech recognition: “In other words, this is a seq2seq problem where the output is much shorter than the input.” That is a very peculiar way to describe it you’re comparing length (in ms) to length (in chars) which is not mathematically valid. Ditto for the TTS discussion • Machine Translation: “Unlike in the previous cases where the order of the inputs was preserved, in machine translation, order inversion can be vital. “Which previous examples? o Speech recognition doesn’t preserve order, even in English e.g. “$10” is pronounced “ten dollars” o “obnoxious tendency” this is offensive and English-centric. Remove o Reordering is one problem with MT. A bigger problem is the many-to-many mappings of words across languages e.g. several words in one language may map to one word in another. • Unusupervised learning: rephrase: extremely anal boss. • Rephrase: pretty lame. • Why do the examples of unsupervised learning only get bullet points and not sub-sections? They’re just as important and with work in autoencoders etc a huge research area • Environment: So far we didn't discuss at all yet, • Monikers >> terms • “there is a large area of situations where” “There are many situations where” • “Needless to say, “ then don’t say it. Or use a different discourse connective • “However there are many cases…” but then the text doesn’t explicitly connect to the images that follow. • Conclusion: does not summarize the section. Total non-sequitur. Says the chain rule is easy but no mention of the chain rule on that page or on the page linked to •

zackchase commented 7 years ago

Going through many of these now that the preface and Introduction are in better shape. I don't aspire to have the book be as dry as you might have it. That sounds boring and while it communicates the technical material it would be personality-less and communicate nothing of the actual environment in which this research is being done. In short, I'm happy with the aspirational statements. It's a preface and this is a project with aspirations. And it's one around which we're forming an open-source community. In the same vein, I'm also fine being flippant.

Many of the language catches are excellent and I'll update the ones that weren't already caught in my rewrite of intorduction.

Re "obnoxious tendency of Germans", this should perhaps be clipped. But you're mistaken about it being English-centric: it was written by the German co-author.

simoncorstonoliver commented 7 years ago

Sounds good. Your style is growing on me :)

Re German: by English-centric I mean that German word order only seems unusual if viewed through the prism of English. English is really the odd one out in the Germanic family. Many a paper has been written on formal and functional analyses of why English doesn't do it the natural Germanic way any longer.

Anyhooo, looking again at the discussion of German: the text says that the verb goes at the end then gives the following example: Haben Sie sich schon dieses grossartige Lehrwerk angeschaut?

The last word in that example is a participle. The main verb of the main clause is the first word "Haben". It occurs in first position because of subject-aux inversion but the general rule would be that the verb goes in second position in German:

http://www.dartmouth.edu/~deutsch/Grammatik/WordOrder/WordOrder.html

The example still nicely illustrates the problem of word alignment in machine translation, so without having to give readers a crash course in part-of-speech tagging in German we could just say:

Consider the following illustrative example of the tendency in German to place a participle at the end of the sentence, resulting in a very different order from English.

zackchase commented 7 years ago

Implemented most of the changes that I agree with and have bandwidth for. Closing now. Thanks for bearing with me through ICLR/FAT conference + NIPS Workshop hell month :)