Open YuanweiLee1 opened 6 months ago
Here is our story: “Public” monu SEGMENTATION datasets consists of 1000x1000 images cropped from WSI file ,provided by paper "A. Datasets". We find out the original WSI files they used, and we want to generate an our own negative-positive CLASSIFICATION dataset. We should not re-use the image region that Monu have cropped from these WSIs , and we require many many 224x224 neg/pos images. So we find a solution : select some huge box regions and automaticly crop all 224x224 images from each region.For example, we only need to move our mouse to draw a box ,maybe 22401x2242, then will automaticly get 1000 224x224 images. So, "The average size of training tiles is 3090 × 2636".And we used to delete some BAD neg/pos images , so 26846 becomes 21246. The "ccrcccrop" dataset is SEGMENTATION dataset.The ccrcc's Neg-Pos CLASSIFICATION dataset is not released, because it can be replaced by monu's Neg-Pos dataset to train a classifier and won't hurt the performance.(ccrcc's NP is too huge, we still have its backup)
Hello! Thank you for your great work. When I try to replicate the experiment, some problems occured to me. I will appreciate it if you can answer these questions and release the code of preprocessing each dataset. First, in the description of MONu in "A. Datasets", there is an expression confusing me e.g. "The average size of training tiles is 3090 × 2636 ... 26846 of which are positive and 16143 are negative." Because it's inconsistent with the data size 1000x1000 of MONu, and how do you obtain the 198 cropped tiles? Also in the given processed MyNP dataset, I find that there is only 21246 positive and 12436 negative tiles, which is inconsistent with the decription in your paper. Second, in the description of CCRCC in "A. Datasets", it seems that 278 image-level labels are utilzed in your study. However I can't find any data with only image-level labels in your given processed ccrcccrop dataset. Best regards.