mlcommons training issues

mlcommons / training

Reference implementations of MLPerf™ training benchmarks

https://mlcommons.org/en/groups/training

Apache License 2.0

1.62k stars 561 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[SD] Finalized the benchmark

#677 ahmadki closed 1 year ago
2
Unable to run unit tests of distributed checkpointing in Megatron-LM

#676 MingjiHan99 opened 1 year ago
1
[SSD] Pinned fiftyone package

#675 ahmadki closed 1 year ago
1
[LLM] Add S3 details to readme

#674 mikolajblaz closed 1 year ago
3
does not have storage.objects.list access to the Google Cloud Storage bucket

#673 karpenko-p-n opened 1 year ago
2
Bump semver and react-scripts in /retired_benchmarks/minigo/tensorflow/minigo/oneoffs/joseki

#672 dependabot[bot] opened 1 year ago
1
[MaskRCNN bug] when MaskRCNN saves checkpoint after training, an error is reported

#671 Xiao-Yamin closed 3 months ago
1
[MaskRCNN bug] make_data_loader() method should only return data_loaders[0] when training

#670 Xiao-Yamin closed 3 months ago
1
AccessDeniedException: 403 does not have storage.objects.list access to the Google Cloud Storage bucket.

#669 zwang92 opened 1 year ago
2
Bump tough-cookie and react-scripts in /retired_benchmarks/minigo/tensorflow/minigo/oneoffs/joseki

#668 dependabot[bot] opened 1 year ago
1
Bump scipy from 1.5.2 to 1.10.0 in /image_segmentation/pytorch

#667 dependabot[bot] opened 1 year ago
1
Bump scipy from 1.0.1 to 1.10.0 in /retired_benchmarks/transformer/tensorflow

#666 dependabot[bot] opened 1 year ago
1
Bump grpcio from 1.11.0 to 1.53.0 in /retired_benchmarks/transformer/tensorflow

#665 dependabot[bot] opened 1 year ago
1
Bump semver and react-scripts in /retired_benchmarks/minigo/tensorflow/minigo/oneoffs/joseki

#664 dependabot[bot] closed 1 year ago
2
Stable_diffusion: document embedding size from ViT-H into Unet

#663 matthew-frank closed 1 year ago
2
Table summarizing benchmark suite

#662 TheKanter closed 4 months ago
2
Added Stable Diffusion (SD) benchmark - Part 2

#661 ahmadki closed 1 year ago
2
Fix tensorflow v1 compatibility for bert

#660 arjunsuresh opened 1 year ago
1
【Bert】Unable to achieve accuracy of 0.72.

#659 BiduCui closed 1 year ago
0
Bump gradio from 3.11 to 3.34.0 in /stable_diffusion

#658 dependabot[bot] closed 9 months ago
2
Bump transformers from 4.19.2 to 4.30.0 in /stable_diffusion

#657 dependabot[bot] closed 9 months ago
2
WIP: Add Stable Diffusion benchmark

#656 ahmadki closed 1 year ago
1
[DLRM v2] How to modify the default training script of DLRM v2 to train the model with limited GPU memory

#655 JJingL opened 1 year ago
1
SSD: exception during conversion of dataset to COCO format

#654 ukurkure closed 9 months ago
5
Are gpt tokenizer model open-source?

#653 xyyintel opened 1 year ago
3
readme updates

#652 anmolgupt closed 1 year ago
7
Would be nice to have parameters counts for all models

#651 rakshithvasudev opened 1 year ago
0
Tag v3.0 code release

#650 nv-rborkar closed 4 months ago
1
Bump requests from 2.18.4 to 2.31.0 in /retired_benchmarks/transformer/tensorflow

#649 dependabot[bot] opened 1 year ago
1
[DLRM v2] Using the model for the inference reference implementation

#648 pgmpablo157321 opened 1 year ago
6
Update license header

#647 nathanw-mlc closed 1 year ago
1
Regarding the issue of continuous memory growth during the training process

#646 Daming-wang closed 3 months ago
1
added train_samples keyword for compliance check

#645 anmolgupt closed 5 months ago
2
Update README.md

#644 arjunsuresh closed 1 year ago
1
Image classification reference implementation is failing on Ubuntu 22.04

#643 arjunsuresh closed 4 months ago
3
Bert pretrain script message "Could not find trained model in model_dir: /tmp/output/"

#642 mahmoodn closed 3 months ago
7
Language model dataset preparation

#641 mahmoodn closed 1 year ago
0
Update README.md

#640 anmolgupt closed 1 year ago
3
Checkpointing DLRMv2

#639 mailvijayasingh closed 3 months ago
5
Steps for language model

#638 mahmoodn closed 1 year ago
0
Ask for the access to download the checkpoint of LLM

#637 JJingL closed 3 months ago
3
[DLRM_DCNv2] Benchmark name in reference implementation

#636 janekl closed 1 year ago
1
Does DLRM_v2 support H100?

#635 xyyintel closed 3 months ago
3
The default training script of DLRM v2 does not reach the reported AUC.

#634 Kevin0624 closed 3 months ago
4
[DLRMv2] Update target AUC in README

#633 janekl closed 1 year ago
2
MLCube integration with Bert

#632 davidjurado opened 1 year ago
6
Summary table for benchmark suite

#631 TheKanter closed 3 months ago
2
[GPT3] update megatron-LM reference

#630 ShriyaPalsamudram closed 1 year ago
5
SSD benchmark with MLCube implementation

#629 davidjurado closed 1 year ago
3
[DLRMv2_DCNv2] Update Criteo 1TB dataset download link

#628 janekl closed 1 year ago
1

Previous Next