多卡finetuning gpt3 1.3B之后，怎么用单卡进行推理

modelscope / modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

https://www.modelscope.cn/

Apache License 2.0

6.99k stars 718 forks source link

多卡finetuning gpt3 1.3B之后，怎么用单卡进行推理 #206

Closed TccccD closed 1 year ago

TccccD commented 1 year ago

多卡多tensor并行 finetuning gpt3 1.3B之后，怎么用单卡进行推理。我在单卡上推理报错如下，但我如果用多卡推理的话，就是正常的

RuntimeError: DistributedGPT3Pipeline: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1169, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

@huangshenno1 @AndersonBY @liuyhwangyh @TTCoding @Firmament-cyou

TccccD commented 1 year ago

以及用4卡finetuning的gpt3，再用4卡推理的时候，就卡住了，卡在了： Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do.

@Firmament-cyou

Firmament-cyou commented 1 year ago

以及用4卡finetuning的gpt3，再用4卡推理的时候，就卡住了，卡在了： Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do.

@Firmament-cyou

您好，多卡并行训练保存的checkpoint片数与并行度相同，目前版本还未提供checkpoint合并功能，后续版本中会更新这一功能～

TccccD commented 1 year ago

以及用4卡finetuning的gpt3，再用4卡推理的时候，就卡住了，卡在了： Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. @Firmament-cyou

您好，多卡并行训练保存的checkpoint片数与并行度相同，目前版本还未提供checkpoint合并功能，后续版本中会更新这一功能～

1、就是多卡训练的模型，也还是要在多卡上推理吧，预计什么时候回推出checkpoint合并功能呢，因为虽然是多卡训练的，但是1.3b模型也可以在单卡上部署，推理只要单卡就好了。 2、可以帮忙看看为什么4卡推理会卡住的问题吗

Mylszd commented 1 year ago

@TccccD 请问您使用的显卡是A100吗，训练GPT3显卡显存的要求是多少呢？谢谢

TccccD commented 1 year ago

训练GPT3显卡显存的要求是

GPT3 1.3B只要V100 -32G的，16g也可以训，但是batchsize要很低

Mylszd commented 1 year ago

训练GPT3显卡显存的要求是

GPT3 1.3B只要V100 -32G的，16g也可以训，但是batchsize要很低

好的谢谢！

Mylszd commented 1 year ago

@Firmament-cyou 您好，使用8卡A30训练完gpt3，推理的时候卡在了这里： using world size: 8, data-parallel-size: 2, tensor-model-parallel size: 4, pipeline-model-parallel size: 1 using torch.float32 for parameters ... initializing torch distributed ...

推理代码如下:

from modelscope.hub.snapshot_download import snapshot_download from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks from modelscope.utils.test_utils import test_level def main(): input = 'xxx' model_id_1_3B = './gpt3_dureader/output' pipe = pipeline(Tasks.text_generation, model=model_id_1_3B) print(pipe(input, top_p=0.9, temperature=0.9, max_length=32))

if name == "main": main()

麻烦看下这是什么问题

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.