lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k stars 395 forks source link

How to use this package correctly? #178

Closed avocardio closed 1 year ago

avocardio commented 1 year ago

Sorry, I have written this in an issue because it might be a skill issue (for me). But maybe just a general question.

How do you use this package when not wanting to do the obvious language modelling and/or image sequencing stuff? I have been trying to get this package to work for the past few days for just using encoder / decoder elements separately but run into OOM every time, even on a 48 GB card with the "simplest" model type:

        encoder_base = TransformerWrapper(
            num_tokens = 64,
            max_seq_len = int(64 * 36),
            attn_layers = Encoder(
                dim = 32,
                depth = 2,
                heads = 2,
                dynamic_pos_bias = True,
            )
        )
  File "/opt/conda/lib/python3.10/site-packages/x_transformers/attend.py", line 289, in forward                                                       
    post_softmax_attn = attn.clone()                                                                                                                  
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.09 GiB (GPU 0; 47.54 GiB total capacity; 41.63 GiB already allocated; 5.14 GiB f
ree; 42.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See docu
mentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 

Also a few things on the side:

I appreciate all suggestions, Thanks.