issues
search
mlcommons
/
training
Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.62k
stars
561
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
404 for download link
#778
hzeng2000
opened
1 week ago
0
Shrinking Llama training to suite one GPU
#777
mahmoodn
closed
3 days ago
1
Is Llama model initialized on GPU?
#776
mahmoodn
opened
3 weeks ago
0
Running Llama training on GPU
#775
mahmoodn
closed
3 weeks ago
0
added param count inside benchmark
#774
hiwotadese
opened
1 month ago
1
Parameter count in summary table - draft 1
#773
hiwotadese
opened
1 month ago
1
pytorch_lightning no longer uses add_argparse_args
#772
mahmoodn
opened
1 month ago
0
fix llama2_70b_lora broken link for Accelerate config file in the readme
#771
hiwotadese
closed
1 month ago
1
double free or corruption (!prev)
#770
ltm920716
opened
1 month ago
1
Access to Rclone Download Instructions for llama2_70b_lora
#769
nnasirinvidia
closed
3 weeks ago
15
Access to Rclone Download Instructions for llama2_70b_lora
#768
conde-amd
closed
1 month ago
3
[Dataset] Clarification on Dataset processing
#767
BowenYao18
opened
2 months ago
1
fix llama2_70b_lora broken link for Accelerate config file in the readme
#766
hiwotadese
closed
1 month ago
4
Accelerate config file missing for LoRA
#765
psyhtest
opened
2 months ago
4
Paxml c4 resplit dataset permission issues
#764
gramesh-amd
closed
3 months ago
6
llama2-lora model/dataset download link doesn't exist
#763
Fizzbb
closed
3 months ago
2
Add MLCube implementation for Graph Neural Network
#762
davidjurado
opened
3 months ago
3
Remove gs://mlperf-llm-public2/ dependency and make reproducibility instructions clear
#761
ShriyaPalsamudram
closed
3 months ago
1
TorchRec DLRM Failed to initialize NumPy: _ARRAY_API not found
#760
rvernica
opened
3 months ago
0
TorchRec DLRM No such file or directory: 'sbatch'
#759
rvernica
closed
3 months ago
1
recommendation_v2/torchrec_dlrm Fatal Python error: Segmentation fault
#758
rvernica
closed
3 months ago
1
[SD] install rclone from upstream (fixes issue #751)
#757
ahmadki
closed
3 months ago
2
Move retired benchmarks to separate folder
#756
ShriyaPalsamudram
closed
3 months ago
1
Stable Diffusion RCP request for Global Batch Size = 4096
#755
suexu1025
closed
3 months ago
1
[SD] added normalization bug notice to the readme
#754
ahmadki
closed
4 months ago
1
Update Llama2 Member download instructions
#753
nathanw-mlc
closed
4 months ago
1
[SD] Added rclone to Dockerfile
#752
ahmadki
closed
4 months ago
1
Stable Diffusion Dataset
#751
amasin2111
closed
3 months ago
10
Running udnet3d on multiple GPUS
#750
luiceur
closed
3 months ago
1
Add MLCube implementation for llama2
#749
davidjurado
opened
5 months ago
1
Update megatron-lm reference to run on hopper gpus
#748
ShriyaPalsamudram
closed
3 months ago
1
Change submission date from v40 to v41
#747
ShriyaPalsamudram
closed
5 months ago
1
Problem downloading S3 bucket
#746
mahmoodn
closed
5 months ago
0
Add `distribution_strategy` and `all_reduce_alg` flags to TensorFlow BERT pretraining
#745
rapsealk
opened
5 months ago
1
Bus error (core dumped) in graph_neural_network
#744
abrarfuad27
closed
5 months ago
0
Improve TensorFlow compatibility in BERT scripts
#743
rapsealk
opened
5 months ago
4
Scope of ML based benchmarks in MLPerf.
#741
rakshithgb-fujitsu
opened
6 months ago
0
`IndexError` in `cross_device_ops` with `MultiWorkerMirroredStrategy`
#740
rapsealk
closed
3 months ago
1
Invalid `local_replica_id` with `MultiWorkerMirroredStrategy`
#739
rapsealk
closed
3 months ago
1
Updating IGB download paths
#738
akhatua2
opened
6 months ago
4
[GNN] Adds example building dockerfile for H100s.
#737
Elnifio
closed
6 months ago
4
llama3 support?
#736
ifelsefi
closed
3 months ago
1
Hardware Configuration
#735
BhAem
closed
3 months ago
1
GNN: update the docker compose file
#734
LiSu
closed
6 months ago
1
[GNN] Fixes the dockerfile
#733
Elnifio
closed
7 months ago
1
DLRM criteo day23 MD5 varify faild
#732
kkkparty
opened
7 months ago
1
Data download for Stable Diffusion fails
#731
coppock
closed
4 months ago
4
[SD] switched to upstream logging (4.0.0-rc2)
#730
ahmadki
closed
5 months ago
1
Add missing logging keys for GNN
#729
LiSu
closed
7 months ago
1
(1) adding support for evaluation skipping; (2) updating model and data…
#728
itayhubara
closed
7 months ago
1
Next