Merging Contract and Elementwise Product Operators

srush commented 3 years ago

https://twitter.com/yeewhye/status/1340031802212311041

"Can’t we do say A B for what you call circle dot, and A j B for what you use the dot (over j) for? Since one can contract multiple dimensions at same time, say A *{jk} B, one can also contact 0 dimensions (A * B)?"

From @ywteh

davidweichiang commented 3 years ago

I want to keep it clear that "axis under operator" means "do the operator to the axis" (not necessarily "do the operator, then sum over the axis"). So if the two operations are merged, I think the symbol chosen should not have a strong association with elementwise product. Maybe \odot is weird enough to not have that association? I'm not sure.

davidweichiang commented 3 years ago

From @mjpost, which turns out to be relevant: https://twitter.com/mjpost/status/1340153497107505152

Does "min X" with no subscript mean "take the min over all elements of X" or "take the trivial elementwise min of X and nothing"?

srush commented 3 years ago

If min with no name means "global min over all axes", then that would imply dot with no name means "global dot over all the axes". So thats the opposite of hadamard. That might be enough ambiguity to justify the circle dot operator.

davidweichiang commented 3 years ago

Yeah. It’s not the most logical for “no axes” to mean “all the axes” but maybe it’s the pragmatic choice because NumPy does it that way and because “do the operation on zero axes” is usually just the identity function.

In that case, I think A \cdot B should contract all the axes that A and B have in common (what @ctongfei calls “natural contraction”) but not the “dangling” axes. But I’m not sure about that.

srush commented 3 years ago

I think that is correct, if we keep the dot notation. but should only be used as a corner case. Many of our examples could be written this way (natural contraction / type inference). However documenting the contraction explicitly is a benefit of our system.

srush commented 3 years ago

Not to complicate it further, but there are some examples of other cases. I think I actually like these better. Maybe we should ban "min"? I don't love it's type being independent of dimension. You should have to rename down first.

Tensordot for instance would do a tensor product (0 contractions) with empty () here:

https://numpy.org/doc/stable/reference/generated/numpy.tensordot.html

Sympy separates these operations entirely.

https://docs.sympy.org/latest/modules/tensor/array.html

Their style is more circle dot (they write it circle times), and then contract.

davidweichiang commented 3 years ago

I didn't understand what you said about a corner case -- do you mean A \cdot B with no subscript?
I didn't understand what you said about banning min.
I think that we can't do a tensor product easily in our notation, because that would involve renaming apart any axes that have the same name.

davidweichiang commented 3 years ago

So, if we went in the other direction, we would say that \min A, \max A, \sum A, where A is a tensor, does nothing (\min A = \max A = \sum A = A). I do kind of like the fact that under this interpretation, these expressions can be interpreted in two different ways (either as elementwise min/max/sum or reduction min/max/sum) and it turns out the same either way.

davidweichiang commented 3 years ago

OK, I just made two competing PRs for this.

srush commented 3 years ago

Let me clarify:

I didn't understand what you said about a corner case -- do you mean A \cdot B with no subscript?

If A \cdot B mean "contract any of the shared axes", then many of the examples (for instance attention) could be written this way without an explicit subscript. Do we want that?

I didn't understand what you said about banning min.

For most our our operation the input type of our operator only can contract a fixed number of axes and only the ones in the type. If we have a min A that means R^{...} -> R then that worries me a bit. I don't understand what that means in our system. Does it contract only the known axes or others that were not specfied (i.e. "batch")

I think that we can't do a tensor product easily in our notation, because that would involve renaming apart any axes that have the same name.

That's a good point. I retract that comment (I guess I meant broadcasting, but I agree that is different).

davidweichiang commented 3 years ago

OK, I see, and agree that operators that can work on all axes would make it more difficult to write equations that work correctly with things like minibatching.

namedtensor / notation

Merging Contract and Elementwise Product Operators #35