I found that in Skin Lesion Segmentation task, you used 'jaccard_similarity_score' function to evaluate ‘JS’ score in your code. After version 0.23, sklearn used 'jaccard_score' instead of it, because "the current Jaccard implementation is ridiculous for binary and multiclass problems, returning accuracy"(https://github.com/scikit-learn/scikit-learn/pull/13151). So your JS score may be inaccurate, after my verification, the actual score should be about 0.76
I found that in Skin Lesion Segmentation task, you used 'jaccard_similarity_score' function to evaluate ‘JS’ score in your code. After version 0.23, sklearn used 'jaccard_score' instead of it, because "the current Jaccard implementation is ridiculous for binary and multiclass problems, returning accuracy"(https://github.com/scikit-learn/scikit-learn/pull/13151). So your JS score may be inaccurate, after my verification, the actual score should be about 0.76