tatsuropfgt / papers

read paper memo

0 stars 0 forks source link

Music Transformer #10

Open tatsuropfgt opened 1 year ago

tatsuropfgt commented 1 year ago

Music Transformer [Huang+, ICLR19]

Reference

Understanding Music Transformer / Hao Hao Tan

Abstract

Generate the (~60s) music that exhibits long-term structure by transformers
Use relative self-attention to represent specific structures of music
Improve the implementation of the transformer with relative attention by reducing memory requirements

Method

Data representation

スクリーンショット 2023-10-05 13 54 02

Relative positional self-attention

Inform how far two positions are apart in a sequence by relative positional self-attention [#9]
It needs to store $O(L^2 D)$

Memory-efficient implementation of relative position-based attention

reduce the memory requirement from $O(L^2 D)$ to $O(LD)$
not to generate R in the below figure by "skewing"
"skew" consists of "pad" and "reshape"

スクリーンショット 2023-10-05 16 02 19

relative global attention

Relative local attention

the model only attends to tokens nearby at each time step
divide the input sequence into several non-overlapping blocks

スクリーンショット 2023-10-05 17 58 36

block1 ⇔ block1 and block2 ⇔ block2 are produced by the same process as relative global attention
block1 ⇔ block2 is produced by the below process

スクリーンショット 2023-10-05 18 03 17

Memo

the relative embedding between 1st and 3rd is the same as the relative embedding between 2nd and 4th

tatsuropfgt commented 1 year ago

Experiment

J.S. Bach Chorales

extend relative attention to capture pairwise distances on timing and pitch

Piano E Competition

give the model an initial motif and let it generate the continuation
the model can generate the music conditioned on the melody
- encoder is given a condition and the decoder can generate the music accompanied by it