invenia / NamedDims.jl

For working with dimensions of arrays by name
https://github.com/JuliaCollections/AxisArraysFuture/issues/1#issuecomment-482077891
MIT License
124 stars 14 forks source link

Existential Dimensions #61

Open oxinabox opened 5 years ago

oxinabox commented 5 years ago

Have been talking to @jekbradbury at juliacon on the existential dimensions. These are to replace our current wildcard dimensions. This will probably take place in a branch that may never merge and might become a nother package cos the idea is pretty wild.

Some facts.

This means maintaining a global collection of this equivelence classes of ExDims and canonical public name. It also means that if you have the same method instance (input types + function) being called in very different parts of the code (by chance), problems can occur. Possibly some namespacing stuff may be needed to also add in the calling module some how. Which would likely need a compiler pass. But that is fine because the next bit:

I think we can make this happen at compile time, because it is fully determine by the method instance. However, maintaining global mutable state during compilation is not entirely allowed by julia. But I think by using the "run backedges backwards" thing that was added to solve the #265 issue for Cassette/IRTools can be used for this also. by triggering appropriate recompilations whenever that mutable state is changed. Though the ideal way to do this would be solve it fully via type inference style

oxinabox commented 5 years ago

Possibly some namespacing stuff may be needed to also add in the calling module some how.

Thinking about this a little more. I think only public names need to namespaced. And I think the namespaces can work as: Each exdim equivelence class can have 1 canonical public name, per namespace.

and the exdims in that equivelence class can be matched against any of their canonical public names. But the different canonical names can not match against each other, if they are from different name-spaces.

oxinabox commented 5 years ago

The cool thing about this would be for a Neural network (or similar) that internally has unnamed dimensions and so has a partially named output)

right now if one has:

names(train_input) == (:obs, :covariates)
names(train_output) == (:obs, :variates)

mdl_output = f(train_input)
names(mdl_output) == (:obs, :_)

cur_loss = norm(train_input .- mdl_output)

Now: since in calculating the cur_loss we did a name asserting operation between mdl_output and train_output then that assigns the :_ from mdl_output to have the public name :variates (from train_outputs).

And that would be remembered even when running on real world data (and not thus computing the loss).

This I think is a textbook motivation for existential dimensions.

nickrobinson251 commented 5 years ago

I do not understand this yet -- but i appreciate the excitement around the idea -- can we talk it through in person? Then maybe I can post my understanding back here / we can work on it?

oxinabox commented 5 years ago

Yeah sure. To be clear it is a kinda insane idea that requires some deep compiler hackery to make work. And probably will want forking this package long term, without plans to merge. It requires at least julia 1.3 to be performant, and possibly some things the compiler doesn't have yet.