Closed colehawkins closed 3 years ago
Hi @colehawkins, it looks great!
I think it's worth adding a from_layer -- Tensorly should be able to decompose them even if their quite big (as long as they fit in memory). We can add an option to specify which CP to use (users can already specify which SVD function to use, e.g. randomized).
The code for tensorizing the shape is great - i was actually also working on something like this (but for factorized linear layers :) ). I think we should put it in a separate file, e.g. in utils, and have a nice API (e.g. get_tensorized_shape
), that can be used for all Block tensor decompositions, what do you think? I had written some of the functions but if sympy provides all we need we can consider adding it as a dependency, though I'm always reluctant to add new dependencies unless it's absolutely necessary.
For the API, we could accept a parameters, e.g. tensorized_shape
that could be either (tensorized_num_embeddings, tensorized_embedding_dim)
or "auto"
to specify everything with a single parameter? For rank, we could pass "same" or 0.5, to respectively keep the number of parameters the same or have half the parameters.
Finally, should we write the forward pass ourselves, efficiently, using the factorizations' getitem?
Thanks for the feedback!
The from_layer
is in, and I've moved over the reshaping code.
I've had some trouble understanding what the most natural way to use __getitem__
is. When I just use
__getitem__(input)
I get a tensor back (as expected), but I don't know how to write
__getitem__([input,:])
which is roughly what I want, since all of the second dimension should come back.
You can directly index the tensorized tensor as an array: fact_tensor[indices, :]
The only caveat is that, by default, this will not work if indices is a nested list.
I guess we normally select the same number of rows R for each sample right?
If so, for N samples, we can have indices be a list of size R*N and then just reshape the output to be (N, R)
-- we can compare with a naive version with for loop.
Thanks for the pointer on indexing.
The challenge that I ran into is that this indexing style only works out of the box for BlockTT since CPTensorized returns a CPTensorized when indexing and TuckerTensorized returns a tensor, not a matrix. Is this the desired behavior?
I've added some slightly clunky code to catch those two cases, but hopefully this covers the standard use cases.
If the embedding layer code looks good, happy to move on to discussing the path forward for the utils.py
, sympy vs our factorization, and reshaping.
Very similar to the previous except that it uses the new reshaping from utils.tensorize_shape
. Minor edit added to utils.tensorize_shape
to correct a typo that returns the input dimension tensorization twice.
Looks good to me, merging. Thanks @colehawkins!
@JeanKossaifi comments welcome!
I didn't include a
from_layer
initialization method because that use case seemed strange to me, and may lead to users factorizing a very large tensor that requires some special treatment (randomized SVD, randomized PARAFAC).I'm happy to add that method for completeness.