How to use this package correctly?

Sorry, I have written this in an issue because it might be a skill issue (for me). But maybe just a general question.

How do you use this package when not wanting to do the obvious language modelling and/or image sequencing stuff? I have been trying to get this package to work for the past few days for just using encoder / decoder elements separately but run into OOM every time, even on a 48 GB card with the "simplest" model type:

        encoder_base = TransformerWrapper(
            num_tokens = 64,
            max_seq_len = int(64 * 36),
            attn_layers = Encoder(
                dim = 32,
                depth = 2,
                heads = 2,
                dynamic_pos_bias = True,
            )
        )

  File "/opt/conda/lib/python3.10/site-packages/x_transformers/attend.py", line 289, in forward                                                       
    post_softmax_attn = attn.clone()                                                                                                                  
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.09 GiB (GPU 0; 47.54 GiB total capacity; 41.63 GiB already allocated; 5.14 GiB f
ree; 42.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See docu
mentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Also a few things on the side:

It somehow doesn't support input data that is not of type long (I have to manually write a line into x_transformers.py's TransformerWrapper on the forward pass to turn x into x = x.long(), why? also this is def causing the OOM...)
I can't use torchsummary with it to inspect the model size and output shapes

I appreciate all suggestions, Thanks.

lucidrains / x-transformers

How to use this package correctly? #178