issues
search
pytorch
/
torchtitan
A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.65k
stars
206
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Is `autocast` needed with FSDP2?
#700
garrett361
opened
27 minutes ago
0
W&B wandb support
#699
msaroufim
opened
2 hours ago
0
fix float8 delayed scaling configuration
#698
vkuzo
closed
34 minutes ago
0
some cleanups
#697
tianyu-l
opened
1 day ago
0
[question] Need clarification on the purpose and performance benefits of GarbageCollection class
#696
qsh-zh
opened
2 days ago
5
recommended practices for loss converging tests
#695
tianyu-l
opened
2 days ago
0
Vote on new features in Discussions
#694
tianyu-l
opened
2 days ago
2
Move RNG seed stuff out of 'set_determinism'
#691
wconstab
closed
2 days ago
1
Control 'deterministic' mode separately from rng seed
#690
wconstab
closed
2 days ago
1
Configure RNGs appropriately for Pipeline + SPMD
#689
wconstab
opened
3 days ago
0
[rfc] torchtitan release practices
#688
tianyu-l
opened
3 days ago
0
Question about FSDP2 + FP8 all gather
#687
sbhavani
closed
3 days ago
3
necessary changes to unblock Sequence Parallel on odd length sequences
#686
tianyu-l
opened
5 days ago
0
[cp] apply fsdp to model when CP is enabled without DP for correct loss and lower mem usage
#685
XilunWu
opened
5 days ago
0
[cp] add option to choose kv shards rotation method
#684
XilunWu
opened
5 days ago
0
[cp] fix the device mesh access issue when CP is not used with DP
#683
XilunWu
opened
5 days ago
0
[pp] Add support for loading schedule csv
#682
H-Huang
closed
3 days ago
0
torch.compile(sync_float8_amax_and_scale_history) not working with triton latest main
#681
goldhuang
opened
6 days ago
1
[Parallelism] Implement vocabulary parallelism
#680
casper-hansen
opened
1 week ago
1
Question about integration with DeepSpeed-Ulysses
#679
zigzagcai
closed
3 days ago
2
Any suggestion for Llama-3.1-70b(128k seq len) deploy mesh with torchtian?
#678
medivh-xp
opened
1 week ago
8
Fine-Tuning Llama Model with Large Context and Customized Dataset Using Torchtitan
#677
Amerehei
opened
1 week ago
7
Very low wps with H200 Gpus
#676
aniltrkkn
opened
1 week ago
6
fsdp2
#675
zigzagcai
closed
1 week ago
1
Batchnorm support with FSDP2
#674
vighneshbirodkar
closed
2 weeks ago
5
Implement sft
#673
aniltrkkn
closed
2 weeks ago
1
support 3rd-party backend
#672
qiongerfei
closed
4 days ago
4
FSDP2 mixed precision error
#671
jiagaoxiang
closed
2 weeks ago
4
Equivalence of `sync_module_states` in fsdp2
#670
qsh-zh
closed
3 weeks ago
4
Low Bit Optimizer Support
#669
nighting0le01
closed
2 weeks ago
1
Why use TF32 Tensorcore Peak Flops for MFU calculation?
#668
LeoXinhaoLee
closed
3 weeks ago
5
[BE] replace the extra DeviceMesh _flatten with mesh access
#667
XilunWu
closed
3 weeks ago
0
[BE] replace the extra DeviceMesh _flatten with mesh access
#666
XilunWu
closed
3 weeks ago
1
[BE] remove old pytorch version warning on strided sharding since 2.5 is official released
#665
XilunWu
closed
3 weeks ago
0
Add test for toml-based pp split points
#664
wconstab
closed
3 weeks ago
0
[WIP] Adding OBELICS DataLoader
#663
TJ-Solergibert
opened
3 weeks ago
2
[Config] Make the checkpoint `step` configurable.
#662
casper-hansen
opened
3 weeks ago
3
[not for land] torch.compile individual linears
#661
vkuzo
opened
3 weeks ago
0
`empty_cache` before `barrier`
#660
carmocca
opened
3 weeks ago
1
Fix data_parallel_shard_degree description
#659
carmocca
closed
3 weeks ago
0
Questions about FSDP2 support and memory usage.
#658
tangjiasheng
opened
3 weeks ago
6
add paper citation
#657
tianyu-l
closed
3 weeks ago
0
Port #642's loss changes to estimation.py
#656
carmocca
closed
4 weeks ago
0
Do not destroy if the world did not init
#655
carmocca
closed
4 weeks ago
1
meta device issue with float8 delayed scale
#654
weifengpy
opened
1 month ago
8
When to use enable_fsdp_float8_all_gather?
#653
goldhuang
closed
1 month ago
1
torch.distributed.breakpoint(rank=1) hangs because of --local-ranks-filter 0
#652
weifengpy
opened
1 month ago
0
FP8Linear saves new parameters in ckpt and I cannot load the saved ckpt
#651
goldhuang
closed
4 days ago
6
[Multimodal] Adding OBELICS DataLoader
#650
TJ-Solergibert
opened
1 month ago
8
Fix PP clip_grad_norm
#649
zijian-hu
closed
1 week ago
1
Next