idiap fast-transformers issues

idiap / fast-transformers

Pytorch library for fast transformer implementations

1.65k stars 179 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'

#132 luispintoc opened 2 months ago
0
Full Attention does not sum to 1

#131 yourj4m closed 4 months ago
1
Speed of linear attention slower than the attention implemented in pytorch

#130 yzeng58 opened 5 months ago
0
[WinError 2] The system cannot find the file specified: build_ext

#129 cliffordkleinsr opened 8 months ago
1
ERROR: Could not build wheels for pytorch-fast-transformers, which is required to install pyproject.toml-based projects

#128 ouusan opened 11 months ago
12
ImportError

#127 PaulaTeeuwen opened 1 year ago
1
Provenance of algorithms

#126 taibai123abc opened 1 year ago
0
`.causal_product_cuda` missing in pip installed version on linux

#125 Jackl-o-o-l closed 1 year ago
4
Error about `causal_product_cpu.cpython-38-darwin.so` on Mac

#124 XiaoqZhang opened 1 year ago
2
Got different result for the same batch

#123 gaoshan2006 closed 1 year ago
1
Fix #106 and #121

#122 wl2776 closed 1 year ago
1
Windows installation - building wheel error

#121 BenoitDalFerro closed 1 year ago
27
Detailed implementation of `clustered_sparse_dot_product`

#120 HanielF opened 1 year ago
0
Understanding how to define key, query and value for the cross attention calculation

#119 neuronphysics opened 1 year ago
0
Example for NLP

#118 Bachstelze opened 1 year ago
0
Cuda version

#117 jiaji-huang opened 1 year ago
1
Speed of recurrent model

#116 mads-oestergaard closed 2 years ago
2
can offer built code for linus?

#115 li-car-fei closed 2 years ago
0
Can't officially save Linear Attention model

#114 maulberto3 opened 2 years ago
2
Any decoder example?

#113 ahmedraza1996 opened 2 years ago
1
Installing error on linux

#112 xxmlala closed 2 years ago
4
Casual attention is cheating by looking in the future

#111 jogardi closed 2 years ago
1
Runtime error on causal_product_cpu on GCC/G++ 11

#110 lsisoft opened 2 years ago
3
how causal mask constructed in training batch model with linear causal attention?

#109 Howuhh opened 2 years ago
0
Parallel complexity of Linear Attention is O(N)?

#108 haozheji opened 3 years ago
1
Training Language Model

#107 lucasnfe closed 3 years ago
2
Windows installation - Building wheel

#106 MaximeHoude closed 1 year ago
3
causal-linear do not use attn_mask ?

#105 davidliujiafeng opened 3 years ago
1
CUDA error: CUBLAS_STATUS_INVALID_VALUE

#104 huu4ontocord closed 3 years ago
1
layernorm eps is not copied properly when cloning HF_Bert

#103 huu4ontocord opened 3 years ago
2
For recurrent models, are positional embeddings required?

#102 rongcuid closed 3 years ago
5
Mask and QK not of the same shape ?

#101 Baldwin-disso closed 3 years ago
1
Huggingface Bert vs. Fast Transformer full attention

#100 lipmem closed 3 years ago
9
Quick start raise a ModuleNotFoundError

#99 CaoYiqingT closed 3 years ago
2
local_dot_product_cuda fails when queries and keys have different lengths

#98 tridao opened 3 years ago
0
Installation failed on Windows

#97 WRKULOL closed 2 years ago
2
Can't import causal_product_cuda

#96 15805383399 closed 3 years ago
1
support of cluster attention

#95 TianhaoFu closed 3 years ago
1
TypeError: forward() missing 3 required positional arguments: 'attn_mask', 'query_lengths', and 'key_lengths'

#94 TianhaoFu closed 3 years ago
1
ModuleNotFoundError: No module named 'aggregate.aggregate_cpu'

#93 TianhaoFu closed 3 years ago
2
installation error

#92 davidliujiafeng closed 3 years ago
6
Question over cuda implementation of causal product (forward)

#91 thomasw21 closed 3 years ago
1
Simplify and 3x Speedup CausalDotProduct CUDA Kernel for Larger Hidden Sizes

#90 qibinc closed 3 years ago
1
pip install and c++ compilation error, then name 'compute_hashes_cuda' is not defined

#89 nikjetchev closed 3 years ago
3
Make fast-transformers JIT Compilable

#88 AndriyMulyar opened 3 years ago
1
[FAVOR & friends] Orthogonal random matrix not uniformly drawn

#87 blefaudeux closed 3 years ago
3
allow to specify activation in transformer

#86 ZhiyuanChen closed 3 years ago
2
Implementing clustering function of clustered_attention with Python.

#85 mHsuann closed 3 years ago
2
Tips and tricks for training linear_att

#84 gaceladri closed 3 years ago
7
CUDA version and CausalDotProduct time

#83 caffeinetoomuch closed 3 years ago
4