vanvalenlab / deepcell-tf

Deep Learning Library for Single Cell Analysis
https://deepcell.readthedocs.io
Other
425 stars 99 forks source link

Tissuenet dataset instance segmentation ground truth #687

Closed tsly123 closed 1 year ago

tsly123 commented 1 year ago

Hi,

Thank you for sharing your work.

I'm trying to benchmark my model on the Tissuenet dataset and I'd to know where to get the Tissuenet dataset ground truth instance segmentation.

I downloaded the Tissuenet 1.1 and the label is in image shape, e.g., (x, y, 2). However, in the paper Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning, deep-watershed is used in postprocessing to generate instance segmentation ground truth. The deep-watershed's input is the output of the model which is [maximas, interiors]

My ultimate goal is to have the Tissuenet instance ground truth in COCO format.

Thank you.

ngreenwald commented 1 year ago

Hi @tsly123, the reason that the ground truth images have dimensions is because one set is for the whole-cell labels, and the other is for the nuclear labels. Each of these represent instance ground truth labels of the dataset.

tsly123 commented 1 year ago

Thank you for your response.

tsly123 commented 1 year ago

Hi,

I'm able to extract the instance masks. However, when I check the dataset Tissuenet 1.1. The test set has some broken samples. I visualized them using the code in Tissuenet README. The broken sample's ID: test_err = [ 68, 69, 286, 463, 504, 506, 510, 511, 529, 545, 547, 681, 737, 757, 955, 1017, 1247]

import os

import numpy as np
import skimage.io as io

from deepcell.utils.plot_utils import create_rgb_image
from deepcell.utils.plot_utils import make_outline_overlay

npz_dir = 'path_to_dir_with_NPZs'
test_dict = np.load(os.path.join(npz_dir, 'tissuenet_v1.1_test.npz'))

test_X, test_y = test_dict['X'], test_dict['y']
rgb_images = create_rgb_image(selected_X, channel_colors=['green', 'blue'])
cv2.imwrite(save_path, rgb_images[68]) # 68, see ID above

input68 input529 input681 input1017

ngreenwald commented 1 year ago

Yes, we included images that are blurry, out of focus, partially cropped, etc. This helps the model generalize better

rossbar commented 1 year ago

Thanks for the prompt and detailed response @ngreenwald !

With the original question addressed, I will go ahead and close this as there's nothing actionable here (aside from maybe documenting the data format in greater detail - doc suggestions welcome!)

anthonyweidai commented 1 year ago

Yes, we included images that are blurry, out of focus, partially cropped, etc. This helps the model generalize better

Can you make the original image (without pre-processing) pubically avaliable? Or can you please send me a private link to download it: davidietop@outlook.com?

tsly123 commented 1 year ago

Hi @anthonyweidai ,

I think what you downloaded in version 1.1 are the raw images.

See this topic https://github.com/vanvalenlab/deepcell-tf/issues/618#issuecomment-1203457143

tsly123 commented 1 year ago

Hi @ngreenwald and @rossbar ,

I print the meta of Tissuenet 1.1 and it looks like this:

from deepcell.datasets import TissueNet

tissuenet = TissueNet(version='1.1')
X_val, y_val, meta_val = tissuenet.load_data(split='test')

                                               filename                                         experiment  pixel_size  screening_passed  time_step               specimen
0                                              filename                                         experiment  pixel_size  screening_passed  time_step               specimen
1                                              filename                                         experiment  pixel_size  screening_passed  time_step               specimen
2                                              filename                                         experiment  pixel_size  screening_passed  time_step               specimen
3     ../../labels/static/2d/Tissue-Spleen/20200424_...  ../../labels/static/2d/Tissue-Spleen/20200424_...         0.5      Not screened       None                 Spleen
4     ../../labels/static/2d/Tissue-Spleen/20200424_...  ../../labels/static/2d/Tissue-Spleen/20200424_...         0.5      Not screened       None                 Spleen
...                                                 ...                                                ...         ...               ...        ...                    ...
1322  ../../labels/static/2d/Tissue-Lung/20200210_Cy...  ../../labels/static/2d/Tissue-Lung/20200210_Cy...         0.5      Not screened       None  lymph node metastasis
1323  ../../labels/static/2d/Tissue-Lung/20200424_TB...  ../../labels/static/2d/Tissue-Lung/20200424_TB...         0.5      Not screened       None                   Lung
1324  ../../labels/static/2d/Tissue-Lung/20200424_TB...  ../../labels/static/2d/Tissue-Lung/20200424_TB...         0.5      Not screened       None                   Lung
1325  ../../labels/static/2d/Tissue-Lung/20200424_TB...  ../../labels/static/2d/Tissue-Lung/20200424_TB...         0.5      Not screened       None                   Lung
1326  ../../labels/static/2d/Tissue-Lung/20200424_TB...  ../../labels/static/2d/Tissue-Lung/20200424_TB...         0.5      Not screened       None                   Lun

It have 4 header rows. So I assume that each row after 4 header rows correspond to the image in X_val and y_val. In other words, the order of images in X_val and y_val and rows after 4 header rows, i.e. meta[4:], is matched. Is this correct? I ask this because I want to get the label specimen for instance segmentation.

In addition, can I take the specimen as the label for each image?

Could you kindly comment on these 2 questions? Thank you very much.

ngreenwald commented 1 year ago

I believe so, but I think some of these additional columns are to make TissueNet compatible with all the other datasets being generated, so I'm not 100% sure

anthonyweidai commented 1 year ago

Hi @anthonyweidai ,

I think what you downloaded in version 1.1 are the raw images.

See this topic #618 (comment)

Then, who is right? The questioner said there are broken samples. And @ngreenwald said they pre-processed some images. BTW, the questioner said the downloaded Tissuenet version is also 1.1.

anthonyweidai commented 1 year ago

Hi @anthonyweidai , I think what you downloaded in version 1.1 are the raw images. See this topic #618 (comment)

Then, who is right? The questioner said there are broken samples. And @ngreenwald said they pre-processed some images. BTW, the questioner said the downloaded Tissuenet version is also 1.1.

I got it. Thanks. These images are not actually "raw". But you can still see how they look like.