lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k stars 395 forks source link

Problem with cache and memory #255

Closed Baran-phys closed 3 months ago

Baran-phys commented 4 months ago

When I use x-transformers (continuous models), my cached memory tend to surge during training. Also, the model uses 600-800% CPU. Is this due to different cloning or some other reasons?