Open wangkuiyi opened 1 year ago
Thank you for this project! It is very helpful for me to understand how GPT2 synthesize text.
I also noticed that the GPT2/encoder.py does not implement the capability of recognizing special tokens as the HuggingFace tokenzier could.
GPT2/encoder.py
The part of source code in HuggingFace's repo is at https://github.com/huggingface/transformers/blob/c836f77266be9ace47bff472f63caf71c0d11333/src/transformers/tokenization_utils.py#L516-L520
I understand that it is not critical, because there is only one special token <|endoftext|> in use https://github.com/wangkuiyi/huggingface-tokenizer-in-cxx/issues/11
<|endoftext|>
So, just saying.
Thank you for this project! It is very helpful for me to understand how GPT2 synthesize text.
I also noticed that the
GPT2/encoder.py
does not implement the capability of recognizing special tokens as the HuggingFace tokenzier could.The part of source code in HuggingFace's repo is at https://github.com/huggingface/transformers/blob/c836f77266be9ace47bff472f63caf71c0d11333/src/transformers/tokenization_utils.py#L516-L520
I understand that it is not critical, because there is only one special token
<|endoftext|>
in use https://github.com/wangkuiyi/huggingface-tokenizer-in-cxx/issues/11So, just saying.