lucidrains / routing-transformer

Fully featured implementation of Routing Transformer
MIT License
281 stars 29 forks source link

What does autoregressive mean? #8

Closed matthew-jurewicz closed 4 years ago

matthew-jurewicz commented 4 years ago

What does autoregressive mean in this context? Is the causal flag related to the autoregressive wrapper?

lucidrains commented 4 years ago

@matthew-jurewicz yup! causal means autoregressive! it's a big fancy word for saying predicting the future given the past. (The cat sat on the ___ --> you make the network predict the blank) You do this in an attention network by preventing the past from seeing the future using a mask. All modern text generators (GPT family) are autoregressive flavors of attention networks.

matthew-jurewicz commented 4 years ago

Awesome! Quick question, the Transformer should work with time series data and I can skip the embedding layer since time series aren't sparse?

lucidrains commented 4 years ago

Yes, absolutely. Not well explored, but I'm sure it will work. Look at what I came across today https://twitter.com/St4ck_Overflow/status/1283782615011663875?s=20

lucidrains commented 4 years ago

Brainwaves are time series data

lucidrains commented 4 years ago

Re: embedding layer, I think it depends on the range and quantization of your time series signal, but you could simply reduce the values to some given range and map it to embeddings. Say your signal ranges from 0 -> 100, you could do 200 embeddings if you wanted to map 0-0.5 -> 1st token, 0.5-1 -> 2nd token, etc etc.

tomweingarten commented 4 years ago

You can also replace the embedding layer with a linear projection from your scalar values into the embedding space of the model, this has worked well for me. If you do this, you'll also need to replace the last layer of the model with a regression head. A single hidden layer with Relu or Gelu followed by a linear projection has worked for me.

matthew-jurewicz commented 4 years ago

I haven't seen much on the application of Transformers to time series data either. Strange, seems like an obvious application.

lucidrains commented 4 years ago

yes, you should try it and report the results, if they are positive! the machine learning community will thank you!