issues
search
mlcommons
/
training
Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.62k
stars
561
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[SD] Finalized the benchmark
#677
ahmadki
closed
1 year ago
2
Unable to run unit tests of distributed checkpointing in Megatron-LM
#676
MingjiHan99
opened
1 year ago
1
[SSD] Pinned fiftyone package
#675
ahmadki
closed
1 year ago
1
[LLM] Add S3 details to readme
#674
mikolajblaz
closed
1 year ago
3
does not have storage.objects.list access to the Google Cloud Storage bucket
#673
karpenko-p-n
opened
1 year ago
2
Bump semver and react-scripts in /retired_benchmarks/minigo/tensorflow/minigo/oneoffs/joseki
#672
dependabot[bot]
opened
1 year ago
1
[MaskRCNN bug] when MaskRCNN saves checkpoint after training, an error is reported
#671
Xiao-Yamin
closed
3 months ago
1
[MaskRCNN bug] make_data_loader() method should only return data_loaders[0] when training
#670
Xiao-Yamin
closed
3 months ago
1
AccessDeniedException: 403 does not have storage.objects.list access to the Google Cloud Storage bucket.
#669
zwang92
opened
1 year ago
2
Bump tough-cookie and react-scripts in /retired_benchmarks/minigo/tensorflow/minigo/oneoffs/joseki
#668
dependabot[bot]
opened
1 year ago
1
Bump scipy from 1.5.2 to 1.10.0 in /image_segmentation/pytorch
#667
dependabot[bot]
opened
1 year ago
1
Bump scipy from 1.0.1 to 1.10.0 in /retired_benchmarks/transformer/tensorflow
#666
dependabot[bot]
opened
1 year ago
1
Bump grpcio from 1.11.0 to 1.53.0 in /retired_benchmarks/transformer/tensorflow
#665
dependabot[bot]
opened
1 year ago
1
Bump semver and react-scripts in /retired_benchmarks/minigo/tensorflow/minigo/oneoffs/joseki
#664
dependabot[bot]
closed
1 year ago
2
Stable_diffusion: document embedding size from ViT-H into Unet
#663
matthew-frank
closed
1 year ago
2
Table summarizing benchmark suite
#662
TheKanter
closed
4 months ago
2
Added Stable Diffusion (SD) benchmark - Part 2
#661
ahmadki
closed
1 year ago
2
Fix tensorflow v1 compatibility for bert
#660
arjunsuresh
opened
1 year ago
1
【Bert】Unable to achieve accuracy of 0.72.
#659
BiduCui
closed
1 year ago
0
Bump gradio from 3.11 to 3.34.0 in /stable_diffusion
#658
dependabot[bot]
closed
9 months ago
2
Bump transformers from 4.19.2 to 4.30.0 in /stable_diffusion
#657
dependabot[bot]
closed
9 months ago
2
WIP: Add Stable Diffusion benchmark
#656
ahmadki
closed
1 year ago
1
[DLRM v2] How to modify the default training script of DLRM v2 to train the model with limited GPU memory
#655
JJingL
opened
1 year ago
1
SSD: exception during conversion of dataset to COCO format
#654
ukurkure
closed
9 months ago
5
Are gpt tokenizer model open-source?
#653
xyyintel
opened
1 year ago
3
readme updates
#652
anmolgupt
closed
1 year ago
7
Would be nice to have parameters counts for all models
#651
rakshithvasudev
opened
1 year ago
0
Tag v3.0 code release
#650
nv-rborkar
closed
4 months ago
1
Bump requests from 2.18.4 to 2.31.0 in /retired_benchmarks/transformer/tensorflow
#649
dependabot[bot]
opened
1 year ago
1
[DLRM v2] Using the model for the inference reference implementation
#648
pgmpablo157321
opened
1 year ago
6
Update license header
#647
nathanw-mlc
closed
1 year ago
1
Regarding the issue of continuous memory growth during the training process
#646
Daming-wang
closed
3 months ago
1
added train_samples keyword for compliance check
#645
anmolgupt
closed
5 months ago
2
Update README.md
#644
arjunsuresh
closed
1 year ago
1
Image classification reference implementation is failing on Ubuntu 22.04
#643
arjunsuresh
closed
4 months ago
3
Bert pretrain script message "Could not find trained model in model_dir: /tmp/output/"
#642
mahmoodn
closed
3 months ago
7
Language model dataset preparation
#641
mahmoodn
closed
1 year ago
0
Update README.md
#640
anmolgupt
closed
1 year ago
3
Checkpointing DLRMv2
#639
mailvijayasingh
closed
3 months ago
5
Steps for language model
#638
mahmoodn
closed
1 year ago
0
Ask for the access to download the checkpoint of LLM
#637
JJingL
closed
3 months ago
3
[DLRM_DCNv2] Benchmark name in reference implementation
#636
janekl
closed
1 year ago
1
Does DLRM_v2 support H100?
#635
xyyintel
closed
3 months ago
3
The default training script of DLRM v2 does not reach the reported AUC.
#634
Kevin0624
closed
3 months ago
4
[DLRMv2] Update target AUC in README
#633
janekl
closed
1 year ago
2
MLCube integration with Bert
#632
davidjurado
opened
1 year ago
6
Summary table for benchmark suite
#631
TheKanter
closed
3 months ago
2
[GPT3] update megatron-LM reference
#630
ShriyaPalsamudram
closed
1 year ago
5
SSD benchmark with MLCube implementation
#629
davidjurado
closed
1 year ago
3
[DLRMv2_DCNv2] Update Criteo 1TB dataset download link
#628
janekl
closed
1 year ago
1
Previous
Next