liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
https://www.zhihu.com/column/c_1456193767213043713
Apache License 2.0
9.95k stars 979 forks source link

gpt2模型在清洗数据报错没有megatron module #23

Open WZT666-dev opened 3 months ago

WZT666-dev commented 3 months ago

root@gpu03:~/Megatron-LM/tools/openwebtext# python3 cleanup_dataset.py /workspace/data/merged_output.json /workspace/data/merged_cleand.json Traceback (most recent call last): File "cleanup_dataset.py", line 12, in from tokenizer import Tokenizer File "/root/Megatron-LM/tools/openwebtext/tokenizer.py", line 13, in from megatron.core.datasets.megatron_tokenizer import MegatronTokenizer ModuleNotFoundError: No module named 'megatron' root@gpu03:~/Megatron-LM/tools/openwebtext#

请问这个要怎么解决呢?