modelscope / ms-swift

Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.38k stars 284 forks source link

SWIFT 2.4 TO DO LIST #1617

Open tastelikefeet opened 1 month ago

tastelikefeet commented 1 month ago

Dataset

  1. Refactor the self cognition dataset to support multi-lingual QAs.

Megatron PreTrain

  1. Support more Megatron models
  2. Support dataset split

Fine-tuning

  1. RAG LLM training investigation

RLHF

  1. PPO training investigation

Multi-modal

  1. GPTQ/AWQ quantization
  2. vLLM inference

Inference&Deployment

  1. PyTorch batch inference
  2. DeepSpeed-Zero inference investigation
  3. Output logits

WEB-UI

  1. Video/Audio chatbot
WSC741606 commented 1 month ago

希望能支持零一万物的Yi-1.5系列的Megatron,感谢大佬~

WSC741606 commented 1 month ago

还有多机多卡的数据集训练加载问题~NFS挂载的网络波动问题导致加载不了本地的cache 我现在是修改了swift/llm/utils/utils.py的def _msdataset_ddp_load(*args, **kwargs):,改成了

    def _msdataset_ddp_load(*args, **kwargs):
        dataset=False
        while not dataset:
          try:
            with safe_ddp_context():
              dataset = _old_msdataset_load(*args, **kwargs)
            return dataset
          except:
            dataset=False

希望有更优雅的解决方法~

WSC741606 commented 1 month ago

另外数据集希望能支持在命令行中给个标签,然后分别计算各个标签的loss,比如通用数据集loss,代码数据集loss,垂域数据集loss等,然后对应到Tensorboard看看情况 看到一个参考代码思路

channel_loss = {}
for step, batch in enumerate(train_dataloader):
    batch = to_device(batch, device)
    channel = batch['channel'][0]

    del batch['channel']
    outputs = model(**batch)
    loss = outputs.loss

    # Update channel loss
    if channel in channel_loss:
        channel_loss[channel][0] += loss.item()
        channel_loss[channel][1] += 1
    else:
        channel_loss[channel] = [loss.item(), 1]

    all_channel_loss = [None for _ in range(world_size)]
    torch.distributed.all_gather_object(all_channel_loss, channel_loss)

    merged_channel_loss = {}
    for lst in all_channel_loss:
        for k, v in lst.items():
            if k in merged_channel_loss:
                merged_channel_loss[k][0] += v[0]
                merged_channel_loss[k][1] += v[1]
            else:
                merged_channel_loss[k] = [v[0], v[1]]

    for k,v in merged_channel_loss.items():
        avg_loss = v[0] / v[1] if v[1] != 0 else 0.0
        print_rank_0("The Channel {} loss is {}".format(k, avg_loss), args.global_rank)

        # Log channel loss to TensorBoard
        if dist.get_rank() == 0:
            writer.add_scalar(f'Loss/channel_{k}', avg_loss, epoch * num_batches + step)

    channel_loss = {}
WSC741606 commented 1 month ago

还有远古的DDP+MP的问题)另外我看日志里输出的是MP,这个有可能进化成PP吗,毕竟朴素MP的话气泡期也太长了,但我这边没跑成功过,所以不太清楚是不是已经做了优化

Jintao-Huang commented 1 month ago

还有远古的DDP+MP的问题)另外我看日志里输出的是MP,这个有可能进化成PP吗,毕竟朴素MP的话气泡期也太长了,但我这边没跑成功过,所以不太清楚是不是已经做了优化

这个device_map主要是用于节约显存的。如果要使用PP,可以使用deepspeed。如果要使用TP,估计需要等megatron了

WSC741606 commented 1 month ago

还有远古的DDP+MP的问题)另外我看日志里输出的是MP,这个有可能进化成PP吗,毕竟朴素MP的话气泡期也太长了,但我这边没跑成功过,所以不太清楚是不是已经做了优化

这个device_map主要是用于节约显存的。如果要使用PP,可以使用deepspeed。如果要使用TP,估计需要等megatron了

好嘞,感谢大佬~

beamind commented 3 weeks ago

希望支持训练RM(reward model)模型

WSC741606 commented 3 weeks ago

还有多机多卡的数据集训练加载问题~NFS挂载的网络波动问题导致加载不了本地的cache 我现在是修改了swift/llm/utils/utils.py的def _msdataset_ddp_load(*args, **kwargs):,改成了

    def _msdataset_ddp_load(*args, **kwargs):
        dataset=False
        while not dataset:
          try:
            with safe_ddp_context():
              dataset = _old_msdataset_load(*args, **kwargs)
            return dataset
          except:
            dataset=False

希望有更优雅的解决方法~

解决了

PancakeAwesome commented 2 days ago

支持 qwenvl2 internvl2 vllm 多图和视频推理,谢谢

ljqnb commented 2 days ago

Please support PPO! Thanks