Open ZhiyuanChen opened 3 months ago
FYI @fxmarty
@ZhiyuanChen Yes, that could be nice. Last time I tried, making KV cache with nestedtensors without using loops was non-trivial but we could revisit that.
@ZhiyuanChen Yes, that could be nice. Last time I tried, making KV cache with nestedtensors without using loops was non-trivial but we could revisit that.
Sure, can I ask what are the difficulties? I think I may be able to help!
@ZhiyuanChen Yes, that could be nice. Last time I tried, making KV cache with nestedtensors without using loops was non-trivial but we could revisit that.
Sure, can I ask what are the difficulties? I think I may be able to help!
I did use a for-loop to create padded and mask tensor, the functions are available at https://github.com/ZhiyuanChen/DanLing/blob/master/danling/tensors/functional.py.
But as they are stored in a lru_cache of NestedTensor, I think the influence can be marginal.
Feature request
Support of NestedTensor.
Conventional Tensor is dense.
For natural language tasks, we also need to pad input tensor with various length to the longest tensor so that it can be processed, and use a mask tensor to indicate the padding positions. This work around does work, but introduces unnecessary computation and have extra memory costs, especially in FFN, where padding is not necessary at all.
Nested Tensor stores tensors in a list so that it does not need paddings.
Motivation
reduce memory costs.
speed up process in FFN and Attention (with FlashAttention).
Alternatives
PyTorch has its own nested_tensor implementation. However, its progress is very slow and unlikely to be useful in the coming months.
Your contribution