Closed davidweichiang closed 3 years ago
I'm a bit mixed on this notation. I like that it scales to higher dimensions, but, I am not sure if adding abstraction is clarifying or not?
One test case is that, you could also add a stride
argument and you could make pool
a special case of unroll
. Would that make it more clear or less clear?
Minor: I don't love the underscript and square brackets on the pool. Could you do that with an intermediary var?
Another small thing is that underscripts on pool
are much lower than on max
. I wonder if there is a way to keep a fixed baseline?
Minor: I don't love the underscript and square brackets on the pool. Could you do that with an intermediary var?
Yes, I think it could be hidden by writing |pool|=k in another equation.
Another small thing is that underscripts on pool are much lower than on max. I wonder if there is a way to keep a fixed baseline?
It's because of the descender on the p. Unfortunately, if you put \struts in to line everything up, it looks much worse. The only solution I can think of is to choose a function name that doesn't have a descender in it!
How does that spacing look?
Sorry this still looks the same? Was it a different version?
Oh, it probably didn't rebuild the pdf. I'll do that now.
My feeling is that if unroll can be defined clearly, then the abstraction is a win because, in Conv2d, two unrolls aren't much harder to understand than one, whereas the 2D version of U might require the reader to re-understand more.
I also feel that unroll would be more clear if defined using slice notation, which we've avoided up to now:
unroll{seq(i)} = X{seq(i:i+|kernel|-1)} (assuming slices are inclusive at both ends)
Similarly
pool{seq(i)} = X{seq(ik:ik+k-1)}
That's interesting, I like the slice here. Although I would have to think more about how it plays with other things like derivatives.
I guess my question was why not go all the way to:
unroll{seq(i)} = X{seq(i \times stride:i \times stride k+|kernel|-1)}
Which works for both conv and pool.
Sorry for the mistake, I should have written the LHS as [unroll X]_{seq(i)}.
The only problem with a stride parameter is that it would increase clustter; unroll would have two axis names under it, a stride, and an argument tensor.
I will create a separate issue for slices.
Maybe I am missing something, but this notation is what is bugging me:
Any downside to just having it be two lines with an intermediate Y variable?
Also Unroll takes X with no parens and pool takes no args. I think we should just be consistent and always write these as standard functions.
I don't love the intermediate variables but agree that it's less complicated to read. The lack of argument to pool was a mistake, now corrected!
Both Conv and MaxPool defined a U tensor that had to be redefined separately for 1d, 2d, 3d, etc. and got increasingly complicated with more dimensions. This tries out a different way that generalizes more easily (I think) to higher dimensions. However, I'm unsure whether this still captures the spirit of the original definition by @srush.