DeepLIIF dataset - Githubissues

AndrewTal commented 2 years ago

Hi,

In https://www.nature.com/articles/s42256-022-00471-x, the dataset is:

The images were scaled and co-registered with the fixed IHC images using affine transformations, resulting in 1,667 registered sets of IHC images and the other modalities of size 512 × 512. We randomly selected 363 sets for training, 53 sets for validation and 600 sets for testing the model. As described in the Synthetic data generation section, we synthetically generated 250 sets using our synthetic data generation model and added 212 to training and 38 to validation.

In https://www.biorxiv.org/content/biorxiv/early/2021/10/08/2021.05.01.442219.full.pdf, the dataset is:

These images were scaled and co-registered with the fixed IHC images using affine transformations, resulting in 1667 registered sets of IHC images and the other modalities of size 512×512. We randomly selected 709 sets for training, 358 sets for validation, and 600 sets for testing the model.

The data splits in the two articles are not the same, but their metric results appear to be the same.

AndrewTal commented 2 years ago

And i downloaded the DeepLIIF dataset in https://zenodo.org/record/4751737#.YV379XVKhH4,

DeepLIIF_Training_Set.zip: 575 images DeepLIIF_Validation_Set.zip: 91 images DeepLIIF_Testing_Set.zip: 598 images

The amount of data seems to correspond to Nature's article, but the testing dataset only have 598 images.

Parmida93 commented 2 years ago

Hi,

Our dataset has been reviewed by different pathologists through several cycles to ensure its quality. After each review, we updated the dataset according to the reviewer's comments. That's why the testing dataset size in zendo is two images less than when we published the paper, as our final pathologist ruled out two images.

nadeemlab / DeepLIIF

DeepLIIF dataset #15