reczoo / FuxiCTR

A configurable, tunable, and reproducible library for CTR prediction https://fuxictr.github.io
Apache License 2.0
914 stars 157 forks source link

Test NVTabular, Petastorm, and Huggingface Datasets for parquet data loading #88

Open zhujiem opened 6 months ago

zhujiem commented 6 months ago

Huggingface Datasets:

 dataset = load_dataset("parquet", data_files={split: data_blocks}, split=split)
 super().__init__(dataset=dataset, num_workers=8, batch_size=self.batch_size)