Sum over inhomogeneous array for calculating metrics

mborhi commented 7 months ago

The error originates from finish_online_evaluation: self.online_eval_tp = np.sum(self.online_eval_tp, 0) results in error as

setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (50,) + inhomogeneous part.

The list, self.online_eval_tp is accumulated in the run_iteration method of the UniSegTrainer class, by running run_online_evaluation. This function appends lists of varying sizes (based on the number of targets of the output parameter) to self.online_eval_tp (and the other lists). As described above, this cannot be summed over.

I believe this is due to the following:

Each task (within MOTS, e.g. kidney, liver, lung, etc.) has different number of targets (3 or 2)
⁠The mapping between tasks to the number of outputs is fixed
The code truncates the output of the model to match the task’s (fixed) specified number of channels
⁠This causes a misalignment when calculating the scores (different number of columns per task)
⁠Because the calculation of the number of columns is fixed, this will always result in an error (when using tasks with differing number of targets)

yeerwen commented 7 months ago

Indeed, I have met the same error after the Numpy library update. To avoid it, I recommend reverting your Numpy library to version 1.23.4.

mborhi commented 6 months ago

Thank you for your quick response. I am seeking clarification as to what the interpretation of this operation would then be, as reverting to version 1.23.4 will result in np.sum(*, 0) flattening the input list.

Thus, the dice score is computed as the average of the Dice score of each batch i and class c, dice_{batch_i,class_c} (as computed on line 724.

Based on my understanding, given a dataset of 3 batches and 1 class, the global Dice score is computed as Dce = 1/3 (Dce_batch1_cls1 + Dce_batch2_cls1 + Dce_batch3_cls1).

Similarly extending the computation for C classes and N batches.

In contrast, the global Dice score should be computed using the global TP, FN, FP as Dce = 2*TP_global/(2TP_global + FN_global + FP_global).

Hence, the interpretation of the elements in the self.all_val_eval_metrics, calculated by taking the mean of the global_dc_per_class, is unclear.

Is this the intended functionality, and if so, how should the resulting metrics be interpreted?

yeerwen commented 6 months ago

Hi,

I believe it's not necessary to focus too much on the validation values. They are primarily for validation purposes and do not determine which epoch's checkpoint will be used as the final one. The checkpoint from the last epoch is used as the final checkpoint.

Best regards, Yiwen Ye

mborhi commented 6 months ago

I understand, thank you for your responses.

Best, Marcell

yeerwen / UniSeg

Sum over inhomogeneous array for calculating metrics #28