huggingface / transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning
MIT License
1.73k stars 431 forks source link

Output of tokenizer.encode has None for space #75

Open ziweiji opened 4 years ago

ziweiji commented 4 years ago

I downloaded the pretrained and fine-tuned model from https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/finetuned_chatbot_gpt.tar.gz

tokenizer_class, model_class = GPT2Tokenizer, GPT2DoubleHeadsModel tokenizer = tokenizer_class.from_pretrained(args.model_checkpoint) tokenizer.encode('good morning')

The output is [3454, None, 1054, 40164]. Get None for space in the list.

ziweiji commented 4 years ago

I think I have wrong tokenizer_class and model_class. But what is the class for this fine-tuned model?