tensorly / torch

TensorLy-Torch: Deep Tensor Learning with TensorLy and PyTorch
http://tensorly.org/torch/
BSD 3-Clause "New" or "Revised" License
70 stars 18 forks source link

Add factorized embedding layer and tests #10

Closed colehawkins closed 2 years ago

colehawkins commented 2 years ago

@JeanKossaifi comments welcome!

I didn't include a from_layer initialization method because that use case seemed strange to me, and may lead to users factorizing a very large tensor that requires some special treatment (randomized SVD, randomized PARAFAC).

I'm happy to add that method for completeness.

JeanKossaifi commented 2 years ago

Hi @colehawkins, it looks great!

I think it's worth adding a from_layer -- Tensorly should be able to decompose them even if their quite big (as long as they fit in memory). We can add an option to specify which CP to use (users can already specify which SVD function to use, e.g. randomized).

The code for tensorizing the shape is great - i was actually also working on something like this (but for factorized linear layers :) ). I think we should put it in a separate file, e.g. in utils, and have a nice API (e.g. get_tensorized_shape), that can be used for all Block tensor decompositions, what do you think? I had written some of the functions but if sympy provides all we need we can consider adding it as a dependency, though I'm always reluctant to add new dependencies unless it's absolutely necessary.

For the API, we could accept a parameters, e.g. tensorized_shape that could be either (tensorized_num_embeddings, tensorized_embedding_dim) or "auto" to specify everything with a single parameter? For rank, we could pass "same" or 0.5, to respectively keep the number of parameters the same or have half the parameters.

Finally, should we write the forward pass ourselves, efficiently, using the factorizations' getitem?

colehawkins commented 2 years ago

Thanks for the feedback!

The from_layer is in, and I've moved over the reshaping code.

I've had some trouble understanding what the most natural way to use __getitem__ is. When I just use

__getitem__(input)

I get a tensor back (as expected), but I don't know how to write

__getitem__([input,:])

which is roughly what I want, since all of the second dimension should come back.

JeanKossaifi commented 2 years ago

You can directly index the tensorized tensor as an array: fact_tensor[indices, :] The only caveat is that, by default, this will not work if indices is a nested list. I guess we normally select the same number of rows R for each sample right? If so, for N samples, we can have indices be a list of size R*N and then just reshape the output to be (N, R) -- we can compare with a naive version with for loop.

colehawkins commented 2 years ago

Thanks for the pointer on indexing.

The challenge that I ran into is that this indexing style only works out of the box for BlockTT since CPTensorized returns a CPTensorized when indexing and TuckerTensorized returns a tensor, not a matrix. Is this the desired behavior?

I've added some slightly clunky code to catch those two cases, but hopefully this covers the standard use cases.

If the embedding layer code looks good, happy to move on to discussing the path forward for the utils.py, sympy vs our factorization, and reshaping.

colehawkins commented 2 years ago

Very similar to the previous except that it uses the new reshaping from utils.tensorize_shape. Minor edit added to utils.tensorize_shape to correct a typo that returns the input dimension tensorization twice.

JeanKossaifi commented 2 years ago

Looks good to me, merging. Thanks @colehawkins!