modelscope / modelscope

ModelScope: bring the notion of Model-as-a-Service to life.
https://www.modelscope.cn/
Apache License 2.0
6.99k stars 718 forks source link

多卡finetuning gpt3 1.3B之后,怎么用单卡进行推理 #206

Closed TccccD closed 1 year ago

TccccD commented 1 year ago

多卡多tensor并行 finetuning gpt3 1.3B之后,怎么用单卡进行推理。我在单卡上推理报错如下,但我如果用多卡推理的话,就是正常的

RuntimeError: DistributedGPT3Pipeline: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1169, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

@huangshenno1 @AndersonBY @liuyhwangyh @TTCoding @Firmament-cyou

TccccD commented 1 year ago

以及用4卡finetuning的gpt3,再用4卡推理的时候,就卡住了,卡在了: Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do.

@Firmament-cyou

Firmament-cyou commented 1 year ago

以及用4卡finetuning的gpt3,再用4卡推理的时候,就卡住了,卡在了: Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do.

@Firmament-cyou

您好,多卡并行训练保存的checkpoint片数与并行度相同,目前版本还未提供checkpoint合并功能,后续版本中会更新这一功能~

TccccD commented 1 year ago

以及用4卡finetuning的gpt3,再用4卡推理的时候,就卡住了,卡在了: Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. @Firmament-cyou

您好,多卡并行训练保存的checkpoint片数与并行度相同,目前版本还未提供checkpoint合并功能,后续版本中会更新这一功能~

1、就是多卡训练的模型,也还是要在多卡上推理吧,预计什么时候回推出checkpoint合并功能呢,因为虽然是多卡训练的,但是1.3b模型也可以在单卡上部署,推理只要单卡就好了。 2、可以帮忙看看为什么4卡推理会卡住的问题吗

Mylszd commented 1 year ago

@TccccD 请问您使用的显卡是A100吗,训练GPT3显卡显存的要求是多少呢?谢谢

TccccD commented 1 year ago

训练GPT3显卡显存的要求是

GPT3 1.3B只要V100 -32G的,16g也可以训,但是batchsize要很低

Mylszd commented 1 year ago

训练GPT3显卡显存的要求是

GPT3 1.3B只要V100 -32G的,16g也可以训,但是batchsize要很低

好的 谢谢!

Mylszd commented 1 year ago

@Firmament-cyou 您好,使用8卡A30训练完gpt3,推理的时候卡在了这里: using world size: 8, data-parallel-size: 2, tensor-model-parallel size: 4, pipeline-model-parallel size: 1 using torch.float32 for parameters ... initializing torch distributed ...

推理代码如下:

from modelscope.hub.snapshot_download import snapshot_download from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks from modelscope.utils.test_utils import test_level def main(): input = 'xxx' model_id_1_3B = './gpt3_dureader/output' pipe = pipeline(Tasks.text_generation, model=model_id_1_3B) print(pipe(input, top_p=0.9, temperature=0.9, max_length=32))

if name == "main": main()

麻烦看下这是什么问题

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.