Closed santhoshkolloju closed 5 years ago
In GPT-2 you must just limit the input sequence to a fixed length. There is no such a clear way introduced in the original GPT-2 paper, as I understood. There is a great explanation of GPT-2 here by jalammar.
At that point, Transformer-XL introduces new concepts called "Segment-level Recurrence" and "Relative Positional Encodings" . You can check the link here for more details.
How can we handle if the length of the context is greater than 1024 which is max length of GPT-2