Hi, @enijkamp
I am trying to make tfrecord for fine-tune with my own dataset, but I am confusing which tokenizer to use it.
I wanna make it with your released Tokenizer, but 4_create_tf_records.py use GPT2 tokenizer or custom tokenizer made by 3_train_tokenizer.py.
If I want to use your official tokenizer, is it right to change this line to 'Salesforce/codegen-xxB-xxx'?
Hi, @enijkamp I am trying to make tfrecord for fine-tune with my own dataset, but I am confusing which tokenizer to use it. I wanna make it with your released Tokenizer, but 4_create_tf_records.py use GPT2 tokenizer or custom tokenizer made by 3_train_tokenizer.py.
If I want to use your official tokenizer, is it right to change this line to 'Salesforce/codegen-xxB-xxx'?