microsoft Megatron-DeepSpeed issues

microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.9k stars 345 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Alcf update readme

#402 saforem2 closed 5 months ago
1
Fix ParallelMLP and enable accelerator test

#401 xinyu-intel closed 5 months ago
1
Fix test_deallocate_output_tensor

#400 xinyu-intel closed 5 months ago
1
fix NAN loss of rope long context training

#399 inkcherry opened 5 months ago
1
MOE TFLOPS calculation

#398 yingzhao27 opened 5 months ago
0
why moe can not use zero3

#397 kuangdao opened 5 months ago
0
Add Zero Bubble Pipeline Parallelism H1 Schedule

#396 nvmdava closed 4 months ago
6
update universal_checkpointing/README.md

#395 inkcherry closed 4 months ago
2
convert mds checkpoint to Hf Llama model

#394 vksastry opened 5 months ago
1
Convert to iteration based training supported by pretraining scripts

#393 zainsarwar865 closed 5 months ago
0
ds-sequence-parallel(ulysses) for rope.

#392 inkcherry opened 5 months ago
0
Update/add GPT/Llama universal checkpointing scripts

#391 lekurile closed 3 months ago
1
Fix trace output path

#390 saforem2 closed 6 months ago
1
Inquiry on Sequence Parallel Support for VocabParallelEmbedding

#389 qinxiangyujiayou opened 6 months ago
0
add HFTokenizer option for preprocess_data

#388 Jianhong-Zhang opened 6 months ago
0
about the optimizer param group

#387 L-hongbin opened 6 months ago
0
屎山代码DeepSpeed

#386 ControllableGeneration opened 6 months ago
3
Sequence Parallel is incompatible with Rotary Positional Embedding

#385 anogkongda opened 6 months ago
4
Spurious all gather performance drop.

#384 etiennemlb opened 6 months ago
0
Add steps and results for running ZeRO stage 3 with universal checkpoint

#383 xylian86 closed 4 months ago
1
Merge `alcf-tests` into `main`

#382 saforem2 closed 7 months ago
1
Call for Conversion from Huggingface to Megads with MoE

#381 ControllableGeneration opened 7 months ago
0
Expert deepcopy raises PickleError

#380 sxontheway opened 7 months ago
0
AttributeError: 'Namespace' object has no attribute 'deepspeed_config_dict'. Did you mean: 'deepspeed_config'? && batch = next(self.data_iterator)

#379 hi20240217 opened 7 months ago
2
Add layer norm weight plus 1

#378 Yejing-Lai opened 7 months ago
1
Assertion failure when there are more than 255 tokenized data files (assert num_datasets < 255 in blendable_dataset.py)

#377 Jeronymous opened 7 months ago
0
Fix ConstantGradScaler and loss-scale argument not match

#376 BeingGod opened 7 months ago
1
Support Llama2Tokenizer

#375 jinyouzhi opened 7 months ago
0
get distributed backend name via accelerator and check loss_scale before writing to tb

#374 polisettyvarma closed 6 months ago
0
Support MoE for GPTModelPipe

#373 mosheisland closed 7 months ago
5
remove contiguous copy for flash-attn opbuilder

#372 YizhouZ closed 7 months ago
7
fix TFLOPs calculation

#371 polisettyvarma closed 3 months ago
4
collect grad_norm for non pipeline path

#370 inkcherry opened 8 months ago
0
Pipeline parallelism + CPU offload?

#369 webber26232 opened 8 months ago
0
Fix the error issue for DP on Megatron-DeepSpeed

#368 ys950902 closed 7 months ago
2
[BUG] Problems with Mixture-of-Experts (MoE)

#367 nikit-srivastava opened 8 months ago
1
[REQUEST] Could you add a new release version tag to Megatron-Deepspeed？Thanks

#366 hijeffwu closed 8 months ago
2
Mistral

#365 Kosei1227 closed 8 months ago
0
Bugs in GPT2 Inference Example

#364 JianzheXiao opened 8 months ago
3
Add Parallel Attention mechanism of Mistral

#363 Kosei1227 closed 8 months ago
3
MOE: Support disable top2 2nd expert sampling

#362 mosheisland closed 8 months ago
0
Support universal checkpoint for GPTModel

#361 mosheisland closed 8 months ago
0
Fine-tune llama2 with sequence parallelism

#360 AnirudhVIyer opened 8 months ago
3
Problem in hf2megads_weight_converter.py

#359 noob-ctrl opened 8 months ago
0
Loss is increasing when fine-tuning from a Megatron-Deepspeed pretrained checkpoint.

#358 SefaZeng opened 8 months ago
0
Unreasonably low throughput on HGX-H100s

#357 GuanhuaWang opened 8 months ago
0
FileNotFoundError: [Errno 2] No such file or directory: 'dataset/index-cache/xxx_doc_idx.npy'

#356 GuanhuaWang opened 8 months ago
6
fix a bug in `pretrain_bert.py`

#355 lzzmm closed 8 months ago
0
Print total number of params when loading model

#354 nightingal3 closed 9 months ago
1
Updates in `megatron/data/{blendable_dataset.py, gpt_dataset.py, indexed_dataset.py}`

#353 saforem2 closed 9 months ago
1

Previous Next