openai / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
https://openai.com/blog/better-language-models/
Other
22.57k stars 5.53k forks source link

BPE using sequence of bytes. HOW ? #227

Open bayethiernodiop opened 4 years ago

bayethiernodiop commented 4 years ago

Hello, I read the paper about GPT 2 : it says that they used BPE on a sequence of bytes and that they only needed a vocab size of 256. I researched the internet but didn't find any explanation on how BPE on sequence of bytes work and why the 256 vocab size. I am confusing since I don't know how this works compared to applying BPE on normal characters and what are the clear motivations since they also say that character/byte level LMs don't work great. How this is different. THANKS.

ghost commented 4 years ago

whats a BPE????????????????????????????????????????????????????????????????????????????????????????