Closed srush closed 3 years ago
Nice! Will take a look (I do need to brush up on my non-existing Haskell :) )
I was also trying to think of this in terms of code as well, wrote my initial thoughts on https://hackmd.io/@boazbk/HyUg4D9iw
Neat, I'll take a look.
The dex style is interesting. They really do treat indexing fully by record types, similar to the v1 proposal. There is no named dimension type (DID), simply a mapping from a name to a standard finite dimension type.
So the default would be:
for w in range(W):
for h in range(H):
print(A[{width:w, height:h}])
You can do alternatively do:
for index in indexset({width: W, height:H}):
print(A[index])
But as far as I can tell there is nothing in-between, i.e. this would not work without more explicit transformations.
for index in indexset({width: W}):
print(A[index])
Although if that's the style we arrive at, I'm sure we could make it work.
I am going to close this as I think the type system of Dex is different enough from what we are building that it would be hard to bridge the gap. Dex is neat, Named Tensors is neat, but they are different beasts.
Have you seen the ideal of Existential Dimensions? I got this idea 2rd hand via @jekbradbury from @dougalm, and idk on its current status for being able to do it in Dex. https://github.com/invenia/NamedDims.jl/issues/61 but i think it would be execelent to be able to do.
It solve the fact that various operations that should return a named dimension don't know what that name sure be. Like a multiply between a unnamed tensor and a named tensor gets one dimension with an unknown name, lets call that a existential name. Another example is the latent dimension from a matrix factorization which gives you two existential names that must be equal to each other on different arrays. But if you add two tensors, one fully named (Call it publicly named) and one with an existential name then now we know that that existential name must be equal to the that public name. So then you can do a kind of type -inference to propagate that name to every other existentially name that has to be the same as this one. and then if you end up while doing this trying to assign two different public names to the same existential name, then you throw an error as someone has done something invalid.
And then there is a fun extension for doing this with namespaces so you can have one public name per namespace.
which i think if done write can let you deal with the fact that one library might call observations :obs
, and another call them :times
.
@apaszke convinced me that Dex can do named tensors even in its current form and provide type checking. It's pretty close. Here's a prototype that implements the current attention formulation using named tensors:
The Dex formulation views names as record index types. It automatically generates functions of the form #seq that act as lenses for accessing these forms.
Their record syntax also lets you do things in roughly the same syntax we have been using. If I want to sum out heads:
or alternatively
The only thing I am stuck on (maybe @apaske knows the answer?) is whether this can do broadcasting? My current implementation manually expands extra dimensions through a
ndim
argument in order to line up the record types. Is there a nice way to get the union between two records automatically? (Particularly is there a version of ndot below where thea
s can be different types)Here's my full implementation if you are interested. My implementation is very similar to @davidweichiang 's named and numbered style. pop pull out a vector dim and push puts it back.