mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

How to run dlrm module with criteo_kaggle dataset? #679

Open esharkwang opened 11 months ago

esharkwang commented 11 months ago

I found dlrm_main.py supported two data set. The criteo_kaggle has smaller size. parser.add_argument( "--dataset_name", type=str, default="criteo_1t", help="dataset for experiment, current support criteo_1tb, criteo_kaggle", )

I downloaded the criteo_kaggle dataset from https://www.kaggle.com/datasets/mrkmakr/criteo-dataset. But it only contains two raw file - train.txt and test.txt. I am not sure how to process it for dlrm module to run. Could someone give me a hint?