issues
search
hpcaitech
/
ColossalAI
Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.28k
stars
4.3k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Hotfix] Enable pp + sp for llama
#5868
Edenzzzz
opened
23 hours ago
0
[misc] fix typo, remove redundant file
#5867
Hz188
closed
1 day ago
0
[FEATURE]: Support SP+PP in Llama etc.
#5866
GuangyaoZhang
opened
1 day ago
1
[BUG]: ColossalChat train sft is skipped with opt-1.3b model
#5865
smash1999
opened
1 day ago
1
[release] update version
#5864
ver217
closed
1 day ago
0
[Feature] remove useless code, modify the pg mesh implementation
#5863
Hz188
closed
1 day ago
0
[zero] remove redundant memebr init
#5862
botbw
closed
2 days ago
0
[BUG]: Colossal AI failed to load ChatGLM2
#5861
hiprince
opened
2 days ago
1
[zero] use bucket during allgather
#5860
ver217
closed
1 day ago
0
[shardformer]delete xformers
#5859
flybird11111
closed
1 day ago
0
Dev/zero offload
#5858
wangbluo
closed
2 days ago
0
[test] fix test
#5857
botbw
closed
3 days ago
0
fix shardformer modeling_llama
#5856
wangbluo
closed
3 days ago
0
[BUG]: loading OPT 66B model - CPU runs out of memory
#5855
PurvangL
opened
3 days ago
2
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
#5854
GuangyaoZhang
opened
3 days ago
0
[FEATURE]: Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
#5853
GuangyaoZhang
opened
3 days ago
0
[doc] Update llama + sp compatibility; fix dist optim table
#5852
Edenzzzz
opened
4 days ago
0
[doc] add GPU cloud playground
#5851
binmakeswell
closed
4 days ago
0
[Chat] Rlhf support SimPO
#5850
YeAnbang
opened
5 days ago
0
[Chat] add SimPO
#5849
YeAnbang
closed
5 days ago
1
[doc] fix open sora model weight link
#5848
binmakeswell
closed
1 week ago
0
[gemini] fixes for benchmarking
#5847
botbw
closed
2 days ago
0
[doc] opensora v1.2 news
#5846
binmakeswell
closed
1 week ago
0
[gemini] fix missing return
#5845
botbw
closed
1 week ago
0
Update Qwen2 model
#5844
wangbluo
closed
1 week ago
0
[zero] modify api
#5843
botbw
closed
1 week ago
0
update llama model
#5842
wangbluo
closed
1 week ago
0
[FEATURE]: Support T5ForTokenClassification
#5841
GuangyaoZhang
closed
1 day ago
0
[zero] comments and naming
#5840
botbw
closed
1 week ago
0
[zero] add low level optimizer back
#5839
botbw
closed
1 week ago
0
[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support
#5838
LRY89757
opened
1 week ago
0
[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers
#5837
yuanheng-zhao
closed
1 week ago
0
[Fix] Remove building on PR when edited to avoid skip issue
#5836
GuangyaoZhang
closed
1 week ago
0
[shardformer] Change atol in test command-r weight-check to pass 8GPU pytest
#5835
GuangyaoZhang
closed
1 week ago
0
[BUG]: Command-R 8 GPU Pytest failure
#5834
GuangyaoZhang
closed
1 week ago
0
[release] update version
#5833
ver217
closed
1 week ago
0
[release] update version
#5832
ver217
closed
1 week ago
0
[FEATURE]: Support Command-R model
#5831
GuangyaoZhang
closed
1 week ago
0
Use gemini plugin and LowLevelZero to run llama2_7b. In the pulgin in gemini, set the policy to static, shard_param_frac, offload_optim_frac, and offload_param_frac to 0.0, making gemini equal to zero2, and set stage to 2 in LowLevelZero. Using bf16 for training, and comparing the two plugins, we found that the GPU memory usage of gemini is higher than that of LowLevelZero. Why is this? In principle, gemini should save more GPU memory
#5830
JJGSBGQ
opened
1 week ago
2
[MoE] Resolve .github conflict
#5829
Hz188
closed
1 week ago
0
[zero] fix param
#5828
botbw
closed
1 week ago
0
[MoE/ZeRO] fix .github conflict with main branch.
#5827
Hz188
closed
1 week ago
0
[FEATURE]: LoRA with sharded model
#5826
KaiLv69
opened
1 week ago
0
Feature/moe
#5825
Hz188
closed
1 week ago
0
[zero] fix missing hook removal
#5824
botbw
closed
1 week ago
0
[hotfix]Solve the compatibility issue of zero refactor
#5823
Hz188
closed
1 week ago
0
[launch] Support IPv4 host initialization
#5822
KaiLv69
closed
1 week ago
0
[MoE/ZeRO] Moe refactor with zero refactor
#5821
Hz188
closed
1 day ago
0
[hotfix] Fix object_to_tensor usage when torch>=2.3.0
#5820
kurisusnowdeng
opened
2 weeks ago
0
[Moe/Zero] Update MoeHybridParallelPlugin with refactored ZeRO and Fix Zero bug
#5819
Hz188
closed
2 weeks ago
0
Next