Closed mengzixing closed 2 years ago
请问有没有demo呀
Sorry for missing the information. We utilize the tools developed by NVIDIA to store the pre-training data (specifically, in https://github.com/NVIDIA/Megatron-LM/blob/main/tools/preprocess_data.py). We should first convert textual data into token ids.
代码里没有给出训练数据,请问有没有train_files/valid_files的demo数据或者数据格式说明?