ValueError: Found input variables with inconsistent numbers of samples:

smith6jt-cop commented 12 months ago

On Ubuntu22.04.3LTS Windows10 (GNU/Linux 5.15.133.1-microsoft-standard-WSL2 x86_64)

After successful run with example images.

Using these images: Mask: https://drive.google.com/file/d/1XGUf9osxak2ZwDSap-eb4vciA4wXa9u7/view?usp=sharing Img: https://drive.google.com/file/d/1ZXgzGkyRCo58knrfE4hl9l4_FW23Q119/view?usp=sharing

I ran the code in the given notebook in Jupyter Lab.

I added some print statements to the code which gave:

Image shape is (1, 28, 1, 2507, 2467) Mask shape is (1, 2, 1, 2507, 2467) Calculating evaluation metrics v1.5 for thymus.ome.tiff Matched mask shape is (2, 2507, 2467) Matched cell mask shape is (2507, 2467) Matched nuclear mask shape is (2507, 2467) Number of channels is 28 Length of cell_coord mask indices is 15489 Length of label list for mask is 10 Length of cell_coord/mask sparse indices is 15489 Feature matrix shape is (15489, 28) Feature matrix z shape is (15489, 28) Length of cell_coord/mask sparse indices is 65512 Feature matrix shape is (16573, 28) Feature matrix z shape is (16573, 28)

Error:

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[3], line 11 7 PCA_model = [] 8 # or give a specific model 9 # PCA_model = pickle.load( open( "2D_PCA_model.pickle", "rb" )) ---> 11 seg_metrics_2D = read_and_eval_seg(img_path, mask_path, PCA_model, output_directory) 13 print(seg_metrics_2D)

File ~/CellSegmentationEvaluator/SimpleCSE/read_and_eval_seg.py:57, in read_and_eval_seg(img_path, mask_path, PCA_model, output_directory) 55 except: 56 print('2D PCA model file missing') ---> 57 seg_metrics = single_method_eval(img, mask, PCA_model, output_directory, bz=bestz) 58 else: 59 if PCA_model==[]:

File ~/CellSegmentationEvaluator/SimpleCSE/seg_eval_pkg.py:729, in single_method_eval(img, mask, PCA_model, output_dir, bz, unit, pixelsizex, pixelsizey) 727 img_channels = np.squeeze(img["data"][0, :, bestz, :, :]) 728 # get cell uniformity --> 729 cell_CV, cell_fraction, cell_silhouette = cell_uniformity( 730 current_mask, img_channels, cell_type_labels 731 ) 732 avg_cell_CV = np.average(cell_CV) 733 avg_cell_fraction = np.average(cell_fraction)

File ~/CellSegmentationEvaluator/SimpleCSE/seg_eval_pkg.py:500, in cell_uniformity(mask, channels, label_list) 498 silhouette.append(1) 499 else: --> 500 silhouette.append(silhouette_score(feature_matrix_z, labels)) 501 for i in range(c): 502 cluster_feature_matrix = feature_matrix[np.where(labels == i)[0], :]

File ~/anaconda3/lib/python3.11/site-packages/sklearn/utils/_param_validation.py:214, in validate_params..decorator..wrapper(*args, *kwargs) 208 try: 209 with config_context( 210 skip_parameter_validation=( 211 prefer_skip_nested_validation or global_skip_validation 212 ) 213 ): --> 214 return func(args, **kwargs) 215 except InvalidParameterError as e: 216 # When the function is just a wrapper around an estimator, we allow 217 # the function to delegate validation to the estimator, but we replace 218 # the name of the estimator by the name of the function in the error 219 # message to avoid confusion. 220 msg = re.sub( 221 r"parameter of \w+ must be", 222 f"parameter of {func.qualname} must be", 223 str(e), 224 )

File ~/anaconda3/lib/python3.11/site-packages/sklearn/metrics/cluster/_unsupervised.py:130, in silhouette_score(X, labels, metric, sample_size, random_state, kwds) 128 else: 129 X, labels = X[indices], labels[indices] --> 130 return np.mean(silhouette_samples(X, labels, metric=metric, kwds))

File ~/anaconda3/lib/python3.11/site-packages/sklearn/utils/_param_validation.py:187, in validate_params..decorator..wrapper(*args, kwargs) 185 global_skip_validation = get_config()["skip_parameter_validation"] 186 if global_skip_validation: --> 187 return func(*args, *kwargs) 189 func_sig = signature(func) 191 # Map args/kwargs to the function signature

File ~/anaconda3/lib/python3.11/site-packages/sklearn/metrics/cluster/_unsupervised.py:257, in silhouette_samples(X, labels, metric, *kwds) 187 @validate_params( 188 { 189 "X": ["array-like", "sparse matrix"], (...) 194 ) 195 def silhouette_samples(X, labels, , metric="euclidean", **kwds): 196 """Compute the Silhouette Coefficient for each sample. 197 198 The Silhouette Coefficient is a measure of how well samples are clustered (...) 255 <https://en.wikipedia.org/wiki/Silhouette(clustering)>` 256 """ --> 257 X, labels = check_X_y(X, labels, accept_sparse=["csr"]) 259 # Check for non-zero diagonal entries in precomputed distance matrix 260 if metric == "precomputed":

File ~/anaconda3/lib/python3.11/site-packages/sklearn/utils/validation.py:1164, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator) 1146 X = check_array( 1147 X, 1148 accept_sparse=accept_sparse, (...) 1159 input_name="X", 1160 ) 1162 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator) -> 1164 check_consistent_length(X, y) 1166 return X, y

File ~/anaconda3/lib/python3.11/site-packages/sklearn/utils/validation.py:407, in check_consistent_length(*arrays) 405 uniques = np.unique(lengths) 406 if len(uniques) > 1: --> 407 raise ValueError( 408 "Found input variables with inconsistent numbers of samples: %r" 409 % [int(l) for l in lengths] 410 )

ValueError: Found input variables with inconsistent numbers of samples: [16573, 15489]`

rfmurphy commented 11 months ago

This took a while to debug. The issue was that some of the nuclear masks (7989 of them out of 15490) contained pixels that were outside the corresponding cell mask (which should not happen in properly segmented cells and nuclei). I implemented our "repair" code (see the Chen & Murphy paper) to trim the nuclei down. Everything runs properly after the repair. I'm cleaning up and documenting the changes and will push soon.

smith6jt-cop commented 11 months ago

It is surprising that the score could not be generated without the repair. Nearly all the methods in the paper were improved with the repair procedure yet were still able to have scores generated without it. However, we segmented with Stardist on DAPI nuclei and Stardist was not evaluated in your paper. For what its worth, Akoya now supplies and recommends a standard Stardist model for CODEX/Phenocycler image processing and it appears to work well for nuclei at least (https://view-su2.highspot.com/viewer/644694a9965a147858092f49?source=email.644694a9965a147858092f4a.83&iid=6298e82d78b2cb1e40aa9fdd).

It is also surprising as the cell masks were simply expansions of the nuclear masks yet somehow the nuclear mask pixels were found outside the cell mask? Perhaps you can share how you determined this to be the problem?

The parameters for nuclei and cell segmentation are shown in the following script:

`import qupath.lib.gui.dialogs.Dialogs def pathModel2 = Dialogs.showMessageDialog("Please select your stardist segmentation file","Please select your StarDist cell segmentation file") def pathModel = Dialogs.promptForFile(null) def pathModel3 = pathModel.toString() //def pathModel = Dialogs.showInputDialog("My title", "Please paste the StarDist file path without quotes", 'text') import qupath.ext.stardist.StarDist2D

// if you are always using the same StarDist model you can set the path below instead of selecting via dialog box //def pathModel = "C:\Users\gcarlson\OneDrive - Akoya Biosciences\Desktop\CODEX Analysis Demo\CODEX_cell_seg.pb"

def stardist = StarDist2D.builder(pathModel3) .threshold(0.5) // Probability (detection) threshold .channels(0) // Select detection channel .normalizePercentiles(1, 99) // Percentile normalization .pixelSize(0.37) // Resolution for detection .cellExpansion(5.0) // Approximate cells based upon nucleus expansion .cellConstrainScale(1.5) // Constrain cell expansion using nucleus size .measureShape() // Add shape measurements .measureIntensity() // Add cell measurements (in all compartments) .includeProbability(true) // Add probability as a measurement (enables later filtering) .build()

// Run detection for the selected objects def imageData = getCurrentImageData() def pathObjects = getSelectedObjects() if (pathObjects.isEmpty()) { Dialogs.showErrorMessage("StarDist", "Please select a parent object!") return } stardist.detectObjects(imageData, pathObjects) println 'Done!'`

Also, here is the script used to extract masks from QuPath:

`import qupath.lib.images.servers.LabeledImageServer

def imageData = getCurrentImageData()

// Define output path (relative to project) def name = GeneralTools.stripExtension(imageData.getServer().getMetadata().getName()) def pathOutput = buildFilePath(PROJECT_BASE_DIR, 'tiles2', name) mkdirs(pathOutput)

// Define output resolution double downsample = 1

// Convert to downsample //double downsample = requestedPixelSize / imageData.getServer().getPixelCalibration().getAveragedPixelSize()

// Create an ImageServer where the pixels are derived from annotations def labelServer = new LabeledImageServer.Builder(imageData) .backgroundLabel(0, ColorTools.WHITE) // Specify background label (usually 0 or 255) .downsample(downsample) // Choose server resolution; this should match the resolution at which tiles are exported .useInstanceLabels(true) .shuffleInstanceLabels(false) .useCellNuclei() // Choose output labels (the order matters!) .grayscale(true) .setBoundaryLabel('Anything*', 0) .lineThickness(downsample as float ) // sorry about the need for 'as float' here... .multichannelOutput(false) // If true, each label is a different channel (required for multiclass probability) .build() def roi = getSelectedROI() def requestROI = RegionRequest.createInstance(labelServer.getPath(), 1, roi) writeImageRegion(labelServer, requestROI, pathOutput)`

rfmurphy commented 11 months ago

Thanks for the suggestion. We will consider evaluating StarDist at some point, but many programs work ok for nuclei but don't get very accurate cell boundaries.

The updated code and README have been pushed. Thanks for helping us improve CellSegmentationEvaluator.

murphygroup / CellSegmentationEvaluator

ValueError: Found input variables with inconsistent numbers of samples: #5