xuewyang / Fashion_Captioning

ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.
81 stars 13 forks source link

Dataset Details Mismatch #11

Open gourango01 opened 3 years ago

gourango01 commented 3 years ago

Is dataset used in the paper different from the preprocessed dataset provided on google drive? Or Am I missing something?
Preprocessed data from the google drive: TRAIN: 888293 VAL: 19915 TEST: 101225

From paper Section 5.1: It contains 993K images and 130K descriptions, and we split the whole dataset, with approximately 794K image-description pairs for training, 99K for validation, and the remaining 100K for test.