xjgaocs / Trans-SVNet

26 stars 5 forks source link

Provide test scripts #10

Open IsabelFunke opened 1 year ago

IsabelFunke commented 1 year ago

Hello, thank you for the great work! So far, this repository provides the training scripts and additionally, the trained model is available on Google drive. However, the test scripts seem to be missing. I mean the code for loading the trained model and obtaining the model predictions on the Cholec80 data. Would it be possible to provide the code?

Best regards, Isabel

xjgaocs commented 1 year ago

I output the inference results as txt files and used https://github.com/YuemingJin/TMRNet/tree/main/code/eval/result/matlab-eval

IsabelFunke commented 1 year ago

Thank you. I now wrote the code for obtaining the model predictions myself. The code can be found at https://github.com/IsabelFunke/Trans-SVNet, along with scripts for data preprocessing and for evaluating the phase predictions. So basically, my repository (based on your provided code) implements the complete pipeline to reproduce your results.

Based on the rewritten code, I was able to reproduce the results that were reported in the Trans-SVNet paper. The MATLAB script prints:

================================================
                    Phase|  Jacc|  Prec|   Rec|
================================================
              Preparation| 73.50| 99.32| 74.42|
---------------------------------------------
  CalotTriangleDissection| 87.65| 93.35| 93.80|
---------------------------------------------
          ClippingCutting| 81.34| 91.29| 90.05|
---------------------------------------------
    GallbladderDissection| 84.46| 90.81| 93.86|
---------------------------------------------
     GallbladderPackaging| 78.49| 86.03| 92.80|
---------------------------------------------
      CleaningCoagulation| 68.83| 83.23| 85.33|
---------------------------------------------
    GallbladderRetraction| 80.55| 89.80| 91.76|
---------------------------------------------
================================================
Mean jaccard: 79.26 +-  6.40
Mean accuracy: 90.19 +-  7.11
Mean precision: 90.55 +-  5.16
Mean recall: 88.86 +-  7.02

However, I want to note that it does not seem sound to me that (1) The evaluation results (as reported in the paper) are computed on all 40 test videos, even if eight of these test videos are used in the training scripts as validation data, e.g., for selecting the best model. It would be more reasonable to report the performance on the 32 unseen videos, as was done in the TeCNO paper. (2) Relaxed evaluation metrics (with relaxed boundaries) are reported in the paper, but this is stated nowhere in the paper.

When I take the phase predictions and calculate the evaluation metrics on the 32 test videos without relaxed boundaries, I obtain:

xjgaocs commented 1 year ago

(1) We trained our model using the first 40 videos only. The 40+40 split is consistent with most previous works. We did not use the validation results in the TeCNO released codes, and we just kept the relevant codes when training TeCNO. (2) Relaxed evaluation metrics are from the official challenges, becasue people might have different opinions on the phase boundaries. Furthermore, the transition frames between different phases are hard to label due to trivial operations. Thus, the results except the transition frames might be more meaningful. Anyway, all settings are the same for compared methods.