zackchase / mxnet-the-straight-dope

An interactive book on deep learning. Much easy, so MXNet. Wow. [Straight Dope is growing up] ---> Much of this content has been incorporated into the new Dive into Deep Learning Book available at https://d2l.ai/.
https://d2l.ai/
Apache License 2.0
2.56k stars 724 forks source link

Plumbing Chapter Contents #212

Open smolix opened 6 years ago

smolix commented 6 years ago

We need a chapter on plumbing basics. This is not strictly ML stuff but more related to al the details of how to build, serve, store, monitor and log models. Lots of tedious details that users need to know for real experiments. In particular:

SumNeuron commented 6 years ago

This chapter exists but oh my is it confusing.

MXNet is great, but with all due respect please take a look at Wolfram's Mathematica for naming conventions and documentation.

For example (in regards to naming) in your linear algebra chapter you show the function asscalar, but in your introduction sections you have functions like as_in_context and asnumpy. The inconsistency with snake case here - in relation to the underscore following as - is bewildering. (This is a seemingly harmless example but emphasizes an overarching point).

In regards to documentation, I will have to say that even with an interactive book, having gone through the straight dope I am even more bewildered about how to use MXNet.

I think right now a lot of the confusion seems to stem from the unclear separation between NDArray, Symbol and Gluon. To my understanding gluon is the "new" way (and hopefully the only way in the future) of defining models in MXNet. This is why your plumbing chapter is confusing when a multilayer perceptron block is defined as:

class MLP(Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        with self.name_scope():
            self.dense0 = nn.Dense(128)
            self.dense1 = nn.Dense(64)
            self.dense2 = nn.Dense(10)

    def forward(self, x):
        x = nd.relu(self.dense0(x))
        x = nd.relu(self.dense1(x))
        return self.dense2(x)

In what was suppose to be a simple example we can see many points of confusion. For me in particular, why is nd.relu() being used? How is that different from nn.Activation(activation="relu") and why wouldn't that be used? Also why are FullyConnected layers from Symbol being renamed Dense here? (see previous comment on naming consistency). Why is only forward being defined and not backward?

Why is Convolution from Symbol split up into Conv1D, Conv2D, etc in Gluon?

My point is that after reading that page I can't figure out how to properly translate the model definition for a common convolution structure:

def cnn(data, filters, kernel, stride=(1,1), padding=(0,0), prefix='', name='name', suffix=''):
    conv = mx.sym.Convolution(data=data, num_filter=filters, kernel=kernel, stride=stride, 
    pad=padding, name='%s_%s_%s_conv_%d_%d' %(prefix, name, suffix, filters, kernel))
    bn = mx.sym.BatchNorm(data=conv, name='%s_%s_%s_bn' %(prefix, name, suffix))
    ramp = mx.sym.Activation(data=bn, act_type='relu', name='%s_%s_%s_ramp' %(prefix, name, 
    suffix))
   return ramp

into a Gluon.Block: (I have no idea if what follows is correct)

class CNN1(mx.gluon.Block):
    def __init__(self, **kwargs):
        super(CNN1, self).__init__(**kwargs)
        with self.name_scope():
            self.cnn = mx.gluon.nn.Conv1D(10, 1)
            self.bn = mx.gluon.nn.BatchNorm()
            self.ramp = mx.gluon.nn.Activation(activation='relu')

    def forward(self, x):
        x = mx.nd.relu(self.cnn(x))
        x = mx.nd.relu(self.bn(x))
        x = mx.nd.relu(self.ramp(x))
        return x

I think MXNet is cool and has a lot of potential, so I hope you do not find this comment as mean. I want to help you focus on providing more clear examples to help users get the most out of MXNet.

That said, I am of the opinion that the whole "everything from scratch chapters" should all be done in a structure mirroring the final class myNet(Block) i.e. nets should all be defined in a class and then section by section remove the user defined functions (from the "from-scratch" sections) with Gluon's built in functionality. Basically, have the users write a self standing full function block, piece by piece.

In addition it is stated that this block structure will help us write non-sequential networks (e.g. a residual structure) but it isn't touched on later...

As for data iterators... I already commented somewhere about how CSVIter has seemingly redundant parameters, while lacking an easily implementable (and more versatile) parameter such as delim.

Honestly I think you should try to go for a function Import_to_Iter which would work something like this

my_file.txt

f1 .   f2 .   f3 .   f4 .   f5 .   f6
10 .  23 .   12 .  32 .  01 .  42
01 .  40 .   02 .  83 .  11 .  22
10 .  23 .   12 .  32 .  01 .  42   

and then have someone call

data_iter = Import_to_Iter('my_file.txt', header=True, input=['f1', 'f2, 'f4'], output=['f6'], delim='\t')

which would parse the file in a manner saying that our goal is take X=[f1, f2, f4] and get y=[f6] (if header was provide, otherwise use integers to denote the fields which belong to input and output.

Note: I emphasize me and my throughout this to show that it is my own opinion and I might be the only one who thinks this.

zackchase commented 6 years ago

Hi @SumNeuron - thanks for the feedback. This feedback is solid and we'll work to rewrite the chapter.

We are drafting the book entirely in the open. One advantage of this, as the comment evidences, is that we get real-time feedback. So we don't ship a book to the press and find out after the fact that something is super confusing for people.

The disadvantage however, is that the live book is in various states of completion. That means that in some cases, like this one, what you see is literally the very first sketch of a draft for that section.

Looking forward to cleaning up this section (and many, many others) soon.

Thanks again for the unfiltered feedback! (CC @smolix)