The main constraint is that current models use a short sequence length because of performance considerations. This issue would become more and more important as we optimize and achieve a much larger context length. See https://github.com/huggingface/swift-transformers/issues/9 for a promising direction.
The main constraint is that current models use a short sequence length because of performance considerations. This issue would become more and more important as we optimize and achieve a much larger context length. See https://github.com/huggingface/swift-transformers/issues/9 for a promising direction.