Dataset Details Mismatch

Is dataset used in the paper different from the preprocessed dataset provided on google drive? Or Am I missing something?
Preprocessed data from the google drive: TRAIN: 888293 VAL: 19915 TEST: 101225

From paper Section 5.1: It contains 993K images and 130K descriptions, and we split the whole dataset, with approximately 794K image-description pairs for training, 99K for validation, and the remaining 100K for test.

xuewyang / Fashion_Captioning

Dataset Details Mismatch #11