lucidrains / x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.42k stars 377 forks source link

`layer_mem` is unbound (when called from `ContinuousTransformerWrapper`) #237

Closed amitkparekh closed 5 months ago

amitkparekh commented 5 months ago
  File "/opt/conda/lib/python3.11/site-packages/x_transformers/continuous.py", line 129, in forward
    x, intermediates = self.attn_layers(x, mask = mask, mems = mems, mem_masks = mem_masks, return_hiddens = True, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/x_transformers/x_transformers.py", line 1330, in forward
    if exists(layer_mem):
              ^^^^^^^^^
UnboundLocalError: cannot access local variable 'layer_mem' where it is not associated with a value

Got this error when trying to run a transformer with a custom order of attn_layers from the ContinuousTransformerWrapper.

I'm not using layer_mem so for me, I just commented out L1330-1331 in x_transformers.py, but as I'm not entirely sure of its uses, I didn't want to just submit a PR for that alone as I presume you have better insight and know a better solution.

lucidrains commented 5 months ago

@amitkparekh hey Amit! yes you are right and i put in a fix

that's really awesome you are using an undocumented feature! are you doing a weight tied transformer?

amitkparekh commented 5 months ago

Thanks for the quick fix!

I'll be honest, I implemented this part of my research a while back and I'm struggling to remember/find my notes on the reason why I'm using the ContinuousTransformerWrapper over another. I implemented this before 1.23.5.

I think it's primarily because I've embedded things outside and I needed to provide multiple things through into the transformer that's instantiated with custom_layers within the attn_layers?

lucidrains commented 5 months ago

@amitkparekh no problem!

oh i see, you are using a custom ordering of the layers, not the weight tying feature

did you hack the code so that it accepts custom modules from outside the repo? i was about to get around to that

amitkparekh commented 5 months ago

Only a little for my needs (through wrappers and such). But being able to create modules and stacks and combine the various possible features into some frankenstein through compositionality would be brilliant and is something I have personally wished for! But I get that refactoring this repo into that would probably be a massive undertaking.

If you're ever open to going that far, I'd be happy to help contribute!

lucidrains commented 5 months ago

@amitkparekh no not too hard, i've been thinking about it for a while, and it may take less than 100 loc to execute for the 80% use case

will keep you updated!

amitkparekh commented 5 months ago

Let me know if you need a second pair of hands! I feel like this repo is like allennlp but for transformer architectures, so anything that makes it that easier to build custom but robust models would be amazing 🤩