jamesdolezal / slideflow

Deep learning library for digital pathology, with both Tensorflow and PyTorch support.
https://slideflow.dev
GNU General Public License v3.0
231 stars 38 forks source link

AUC =0.5 #293

Closed ltzg closed 1 year ago

ltzg commented 1 year ago

Dear Author,

I used the TCGA colorectal cancer WSI file to establish a 3-year recurrence prediction model (0 and 1) for colorectal cancer patients after surgery. I delineated the ROI of the WSI and homogenized the patches. I tried using different CNN networks (densenet, Restnet) to build the model, but the AUC was always 0.5. It seems that all the patches are classified as 0 or 1. Is there a better adjustment suggestion? Here is my code. ####################### import slideflow as sf
P = sf.create_project(
root='.',
annotations="./RFS1.csv",
slides='./slides1/',
)

P.extract_tiles(tile_px=256, tile_um=256)

P = sf.load_project('.')###每次退出都要重新加载项目文件
from slideflow.model import build_feature_extractor
P = sf.load_project('.')###每次退出都要重新加载项目文件
dataset = P.dataset(tile_px=256, tileum=256)
labels,
= dataset.labels('category')
train_dts, val_dts = dataset.split(
model_type='categorical',
labels=labels,
val_strategy='k-fold',
val_k_fold=2,
k_fold_iter=1
)
hp = sf.ModelParams(
tile_px=256,
tile_um=256,

model='densenet',

batch_size=32,                                                                             
epochs=[2]                                                                                 

)

Train with 5-fold cross-validation

P.train(
'category',
params=hp,
val_k_fold=2
)

jamesdolezal commented 1 year ago

Hi Itzg,

There are several things that can cause a model to fail to converge. This includes technical problems (e.g. issues with data or labels), suboptimal parameters (e.g. tile size, model architecture, hyperparameters), insufficient data (too few slides), or biological issues (e.g., the outcome can't be accurately predicted because there isn't sufficient true biological signal).

When troubleshooting models, the first place I like to start is by inspecting the dataset. Review the TFRecord tile extraction report and make sure that the data looks reasonable (tiles taken from inside areas of tumor). Then, use Dataset.summary() to quickly inspect the dataset to make sure that the TFRecords are valid and you have enough data for each outcome (0 and 1).

Can you paste the results of the following code, which will help me better understand the structure of your data?

import slideflow as sf

P = sf.load_project('.')
dataset = P.dataset(tile_px=256, tile_um=256)
dataset.summary()
ltzg commented 1 year ago
    Thank you very much. These are the return of "dataset.summary()"Overview:╒═════════════════════╤═════════════════╕│ Configuration file: │ ./datasets.json ││ Tile size (px):     │ 256             ││ Tile size (um):     │ 256             ││ Slides:             │ 38              ││ Patients:           │ 38              ││ Slides with ROIs:   │ 38              ││ Patients with ROIs: │ 38              │╘═════════════════════╧═════════════════╛Filters:╒═══════════════╤════╕│ Filters:      │ {} │├───────────────┼────┤│ Filter Blank: │ [] │├───────────────┼────┤│ Min Tiles:    │ 0  │╘═══════════════╧════╛Sources:MyProject╒═══════════╤════════════════╕│ slides    │ ./slides2/     ││ roi       │ ./slides2/rois ││ tiles     │ ./tiles        ││ tfrecords │ ./tfrecords    ││ label     │ 256px_256um    │╘═══════════╧════════════════╛Number of tiles in TFRecords: 35921Annotation columns:Index([&apos;patient&apos;, &apos;dataset&apos;, &apos;category&apos;, &apos;slide&apos;, &apos;AWG_MLH1_silencing&apos;,       &apos;AWG_cancer_type_Oct62011&apos;, &apos;CDE_ID_3226963&apos;, &apos;CIMP&apos;,       &apos;MSI_updated_Oct62011&apos;, &apos;X_INTEGRATION&apos;,       ...       &apos;X_PATIENT.y&apos;, &apos;OS&apos;, &apos;OS.time&apos;, &apos;DSS&apos;, &apos;DSS.time&apos;, &apos;DFI&apos;, &apos;DFI.time&apos;,       &apos;PFI&apos;, &apos;PFI.time&apos;, &apos;slide.1&apos;],      dtype=&apos;object&apos;, length=137) It should be pointed out that the original task 1 had 38 people and 0 had 98 people, and the predicted result was also AUC=0.5. Considering that there may be no ROI, I annotate the ROI in a small range. The following are the results of one training and validation session[09:39:45 AM] INFO     Training model category-HP0 (k-fold #1)...                             INFO     Hyperparameters: {                                                                "augment": "xyrj",                                                              "batch_size": 32,                                                               "drop_images": false,                                                           "dropout": 0,                                                                   "early_stop": false,                                                            "early_stop_method": "loss",                                                    "early_stop_patience": 0,                                                       "epochs": [                                                                       2                                                                             ],                                                                              "hidden_layer_width": 500,                                                      "hidden_layers": 0,                                                             "include_top": true,                                                            "l1": 0.0,                                                                      "l1_dense": 0.0,                                                                "l2": 0.0,                                                                      "l2_dense": 0.0,                                                                "learning_rate": 0.0001,                                                        "learning_rate_decay": 0,                                                       "learning_rate_decay_steps": 100000,                                            "loss": "CrossEntropy",                                                         "manual_early_stop_batch": null,                                                "manual_early_stop_epoch": null,                                                "model": "xception",                                                            "normalizer": null,                                                             "normalizer_source": null,                                                      "optimizer": "Adam",                                                            "pooling": "max",                                                               "tile_px": 256,                                                                 "tile_um": 256,                                                                 "toplayer_epochs": 0,                                                           "trainable_layers": 0,                                                          "training_balance": "category",                                                 "uq": false,                                                                    "validation_balance": "none"                                                  }                                                                      INFO     Val settings: {                                                                   "strategy": "k-fold",                                                           "k_fold": 2,                                                                    "k": null,                                                                      "k_fold_header": null,                                                          "fraction": null,                                                               "source": null,                                                                 "annotations": null,                                                            "filters": null,                                                                "dataset": null                                                               }                                                                      INFO     No compatible train/val split found.                                   INFO     Logging new split at ./splits.json                                     INFO     Category        0       1                                              INFO     K-fold-0        10      10                                             INFO     K-fold-1        9       9                                              INFO     Using 18 training TFRecords, 20 validation                             INFO     Steps per epoch = 432                                    [09:39:48 AM] INFO     Using pretraining: imagenet                              ModelWrapper           Parameters  Buffers  Output shape        Datatype---                    ---         ---      ---                 ---     model.conv1            864         -        [32, 32, 127, 127]  float32 model.bn1              64          65       [32, 32, 127, 127]  float32 model.conv2            18432       -        [32, 64, 125, 125]  float32 model.bn2              128         129      [32, 64, 125, 125]  float32 model.block1.rep       26816       514      [32, 128, 63, 63]   float32 model.block1.skip      8192        -        [32, 128, 63, 63]   float32 model.block1.skipbn    256         257      [32, 128, 63, 63]   float32 model.block2.rep       102784      1026     [32, 256, 32, 32]   float32 model.block2.skip      32768       -        [32, 256, 32, 32]   float32 model.block2.skipbn    512         513      [32, 256, 32, 32]   float32 model.block3.rep       728120      2914     [32, 728, 16, 16]   float32 model.block3.skip      186368      -        [32, 728, 16, 16]   float32 model.block3.skipbn    1456        1457     [32, 728, 16, 16]   float32 model.block4.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block5.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block6.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block7.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block8.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block9.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block10.rep      1613976     4371     [32, 728, 16, 16]   float32 model.block11.rep      1613976     4371     [32, 728, 16, 16]   float32 model.block12.rep      1292064     3506     [32, 1024, 8, 8]    float32 model.block12.skip     745472      -        [32, 1024, 8, 8]    float32 model.block12.skipbn   2048        2049     [32, 1024, 8, 8]    float32 model.conv3.conv1      9216        -        [32, 1024, 8, 8]    float32 model.conv3.pointwise  1572864     -        [32, 1536, 8, 8]    float32 model.bn3              3072        3073     [32, 1536, 8, 8]    float32 model.conv4.conv1      13824       -        [32, 1536, 8, 8]    float32 model.conv4.pointwise  3145728     -        [32, 2048, 8, 8]    float32 model.bn4              4096        4097     [32, 2048, 8, 8]    float32 model.last_linear      -           -        [32, 2048]          float32 fc0                    4098        -        [32, 2]             float32 ---                    ---         ---      ---                 ---     Total                  20811050    54568    -                   -       [09:39:55 AM] INFO     Epoch 1/2                                                train loss: 0.4491 acc: 0.7879 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:44 312 img/s[09:40:39 AM] INFO     train Epoch 1 | loss: 0.4491 acc: 0.7879                               INFO     Epoch 2/2                                                train loss: 0.2796 acc: 0.8937 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:42 325 img/s[09:41:22 AM] INFO     train Epoch 2 | loss: 0.2796 acc: 0.8937                               INFO     Model saved to                                                                  ./models/00000-category-HP0-kfold1/category-HP0-kfold1_ep                       och2.zip                                                   Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:17 1253 img/s[09:41:41 AM] INFO     Validation metrics for outcome category:                               INFO     tile-level AUC (cat # 0): 0.500 AP: 0.640 (opt.                                 threshold: 1.675)                                                      INFO     tile-level AUC (cat # 1): 0.500 AP: 0.360 (opt.                                 threshold: 1.325)                                                      INFO     Category 0 acc: 100.0% (14142/14142)                                   INFO     Category 1 acc: 0.0% (0/7947)                                          INFO     Validation metrics for outcome category:                 [09:41:42 AM] INFO     slide-level AUC (cat # 0): 0.500 AP: 0.500 (opt.                                threshold: 1.675)                                                      INFO     slide-level AUC (cat # 1): 0.500 AP: 0.500 (opt.                                threshold: 1.325)                                                      INFO     Category 0 acc: 100.0% (10/10)                                         INFO     Category 1 acc: 0.0% (0/10)                                            INFO     Validation metrics for outcome category:                               INFO     patient-level AUC (cat # 0): 0.500 AP: 0.500 (opt.                              threshold: 1.675)                                                      INFO     patient-level AUC (cat # 1): 0.500 AP: 0.500 (opt.                              threshold: 1.325)                                                      INFO     Category 0 acc: 100.0% (10/10)                                         INFO     Category 1 acc: 0.0% (0/10)                                            INFO     val Epoch 2 | loss: 0.6558 acc: 0.6402                   [09:41:43 AM] INFO     Training model category-HP0 (k-fold #2)...                             INFO     Hyperparameters: {                                                                "augment": "xyrj",                                                              "batch_size": 32,                                                               "drop_images": false,                                                           "dropout": 0,                                                                   "early_stop": false,                                                            "early_stop_method": "loss",                                                    "early_stop_patience": 0,                                                       "epochs": [                                                                       2                                                                             ],                                                                              "hidden_layer_width": 500,                                                      "hidden_layers": 0,                                                             "include_top": true,                                                            "l1": 0.0,                                                                      "l1_dense": 0.0,                                                                "l2": 0.0,                                                                      "l2_dense": 0.0,                                                                "learning_rate": 0.0001,                                                        "learning_rate_decay": 0,                                                       "learning_rate_decay_steps": 100000,                                            "loss": "CrossEntropy",                                                         "manual_early_stop_batch": null,                                                "manual_early_stop_epoch": null,                                                "model": "xception",                                                            "normalizer": null,                                                             "normalizer_source": null,                                                      "optimizer": "Adam",                                                            "pooling": "max",                                                               "tile_px": 256,                                                                 "tile_um": 256,                                                                 "toplayer_epochs": 0,                                                           "trainable_layers": 0,                                                          "training_balance": "category",                                                 "uq": false,                                                                    "validation_balance": "none"                                                  }                                                                      INFO     Val settings: {                                                                   "strategy": "k-fold",                                                           "k_fold": 2,                                                                    "k": null,                                                                      "k_fold_header": null,                                                          "fraction": null,                                                               "source": null,                                                                 "annotations": null,                                                            "filters": null,                                                                "dataset": null                                                               }                                                                      INFO     Using k-fold validation split detected at ./splits.json                         (ID: 0)                                                                INFO     Using 20 training TFRecords, 18 validation                             INFO     Steps per epoch = 690                                                  INFO     Using pretraining: imagenet                              ModelWrapper           Parameters  Buffers  Output shape        Datatype---                    ---         ---      ---                 ---     model.conv1            864         -        [32, 32, 127, 127]  float32 model.bn1              64          65       [32, 32, 127, 127]  float32 model.conv2            18432       -        [32, 64, 125, 125]  float32 model.bn2              128         129      [32, 64, 125, 125]  float32 model.block1.rep       26816       514      [32, 128, 63, 63]   float32 model.block1.skip      8192        -        [32, 128, 63, 63]   float32 model.block1.skipbn    256         257      [32, 128, 63, 63]   float32 model.block2.rep       102784      1026     [32, 256, 32, 32]   float32 model.block2.skip      32768       -        [32, 256, 32, 32]   float32 model.block2.skipbn    512         513      [32, 256, 32, 32]   float32 model.block3.rep       728120      2914     [32, 728, 16, 16]   float32 model.block3.skip      186368      -        [32, 728, 16, 16]   float32 model.block3.skipbn    1456        1457     [32, 728, 16, 16]   float32 model.block4.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block5.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block6.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block7.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block8.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block9.rep       1613976     4371     [32, 728, 16, 16]   float32 model.block10.rep      1613976     4371     [32, 728, 16, 16]   float32 model.block11.rep      1613976     4371     [32, 728, 16, 16]   float32 model.block12.rep      1292064     3506     [32, 1024, 8, 8]    float32 model.block12.skip     745472      -        [32, 1024, 8, 8]    float32 model.block12.skipbn   2048        2049     [32, 1024, 8, 8]    float32 model.conv3.conv1      9216        -        [32, 1024, 8, 8]    float32 model.conv3.pointwise  1572864     -        [32, 1536, 8, 8]    float32 model.bn3              3072        3073     [32, 1536, 8, 8]    float32 model.conv4.conv1      13824       -        [32, 1536, 8, 8]    float32 model.conv4.pointwise  3145728     -        [32, 2048, 8, 8]    float32 model.bn4              4096        4097     [32, 2048, 8, 8]    float32 model.last_linear      -           -        [32, 2048]          float32 fc0                    4098        -        [32, 2]             float32 ---                    ---         ---      ---                 ---     Total                  20811050    54568    -                   -       [09:41:48 AM] INFO     Epoch 1/2                                                train loss: 0.4671 acc: 0.7768 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:01:09 317 img/s[09:42:57 AM] INFO     train Epoch 1 | loss: 0.4671 acc: 0.7768                               INFO     Epoch 2/2                                                train loss: 0.3189 acc: 0.8693 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:01:08 321 img/s[09:44:06 AM] INFO     train Epoch 2 | loss: 0.3189 acc: 0.8693                               INFO     Model saved to                                                                  ./models/00001-category-HP0-kfold2/category-HP0-kfold2_ep                       och2.zip                                                   Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:10 1271 img/s[09:44:18 AM] INFO     Validation metrics for outcome category:                 [09:44:19 AM] INFO     tile-level AUC (cat # 0): 0.500 AP: 0.394 (opt.                                 threshold: 1.475)                                                      INFO     tile-level AUC (cat # 1): 0.500 AP: 0.606 (opt.                                 threshold: 1.525)                                                      INFO     Category 0 acc: 0.0% (0/5455)                                          INFO     Category 1 acc: 100.0% (8377/8377)                                     INFO     Validation metrics for outcome category:                               INFO     slide-level AUC (cat # 0): 0.500 AP: 0.500 (opt.                                threshold: 1.475)                                                      INFO     slide-level AUC (cat # 1): 0.500 AP: 0.500 (opt.                                threshold: 1.525)                                                      INFO     Category 0 acc: 0.0% (0/9)                                             INFO     Category 1 acc: 100.0% (9/9)                                           INFO     Validation metrics for outcome category:                 [09:44:20 AM] INFO     patient-level AUC (cat # 0): 0.500 AP: 0.500 (opt.                              threshold: 1.475)                                                      INFO     patient-level AUC (cat # 1): 0.500 AP: 0.500 (opt.                              threshold: 1.525)                                                      INFO     Category 0 acc: 0.0% (0/9)                                             INFO     Category 1 acc: 100.0% (9/9)                                           INFO     val Epoch 2 | loss: 0.6838 acc: 0.6056                                 INFO     Training results saved: ./results_log.csv                              INFO     Training complete; validation accuracies:                              INFO     category-HP0-kfold1 training metrics:                                  INFO     loss: 0.27962709085463927                                              INFO     accuracy: 0.893663227558136                                            INFO     category-HP0-kfold1 validation metrics:                                INFO     loss: 0.6558491226190072                                               INFO     accuracy: 0.6402281678663588                                           INFO     category-HP0-kfold2 training metrics:                                  INFO     loss: 0.3189038523891266                                               INFO     accuracy: 0.8692935109138489                                           INFO     category-HP0-kfold2 validation metrics:                                INFO     loss: 0.6838010012448489                                               INFO     accuracy: 0.6056246385193753                             {'category-HP0-kfold1': {'epochs': defaultdict(<class 'dict'>, {'epoch1': {'train_metrics': {'loss': 0.44913872703909874, 'accuracy': 0.7879050970077515}}, 'epoch2': {'train_metrics': {'loss': 0.27962709085463927, 'accuracy': 0.893663227558136}, 'val_metrics': {'loss': 0.6558491226190072, 'accuracy': 0.6402281678663588}, 'tile_auc': {'category': [0.5, 0.5]}, 'slide_auc': {'category': [0.5, 0.5]}, 'patient_auc': {'category': [0.5, 0.5]}, 'tile_ap': {'category': [0.6402281678663588, 0.35977183213364117]}, 'slide_ap': {'category': [0.5, 0.5]}, 'patient_ap': {'category': [0.5, 0.5]}}})}, 'category-HP0-kfold2': {'epochs': defaultdict(<class 'dict'>, {'epoch1': {'train_metrics': {'loss': 0.4671213014834169, 'accuracy': 0.7768115997314453}}, 'epoch2': {'train_metrics': {'loss': 0.3189038523891266, 'accuracy': 0.8692935109138489}, 'val_metrics': {'loss': 0.6838010012448489, 'accuracy': 0.6056246385193753}, 'tile_auc': {'category': [0.5, 0.5]}, 'slide_auc': {'category': [0.5, 0.5]}, 'patient_auc': {'category': [0.5, 0.5]}, 'tile_ap': {'category': [0.39437536148062463, 0.6056246385193753]}, 'slide_ap': {'category': [0.5, 0.5]}, 'patient_ap': {'category': [0.5, 0.5]}}})}}

---- Replied Message ----

     From 

        James ***@***.***>

     Date 

    6/8/2023 02:30

     To 

        ***@***.***>

     Cc 

        ***@***.***>
        ,

        ***@***.***>

     Subject 

          Re: [jamesdolezal/slideflow] AUC =0.5 (Issue #293)

Hi Itzg, There are several things that can cause a model to fail to converge. This includes technical problems (e.g. issues with data or labels), suboptimal parameters (e.g. tile size, model architecture, hyperparameters), insufficient data (too few slides), or biological issues (e.g., the outcome can't be accurately predicted because there isn't sufficient true biological signal). When troubleshooting models, the first place I like to start is by inspecting the dataset. Review the TFRecord tile extraction report and make sure that the data looks reasonable (tiles taken from inside areas of tumor). Then, use Dataset.summary() to quickly inspect the dataset to make sure that the TFRecords are valid and you have enough data for each outcome (0 and 1). Can you paste the results of the following code, which will help me better understand the structure of your data? import slideflow as sf

P = sf.load_project('.') dataset = P.dataset(tile_px=256, tile_um=256) dataset.summary()

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

jamesdolezal commented 1 year ago

Thanks for providing this.

It looks like you're using the PyTorch backend. I would recommend giving the Tensorflow backend a try, as we've recently identified an issue where training models with PyTorch can result in poor performance with the combination of small datasets sizes and small batch sizes (see: 662350a). As you're using a small dataset (around 10 slides per outcome) and a small batch size (batch_size=32), you might be affected. We've written a patch that should be released by Friday.

In the meantime, try running the models using Tensorflow. You can do this by first installing Tensorflow and its dependencies:

pip install slideflow[tf]

Then, in the terminal you're running your code in, set the environmental variable SF_BACKEND=tensorflow:

export SF_BACKEND=tensorflow

Finally, you can add the following line to the top of your script, to confirm that the models is in fact being trained in the tensorflow backend:

import slideflow as sf
sf.about()

Let me know if that works, otherwise we can troubleshoot further.

ltzg commented 1 year ago
    I create a new conda env for tf, and the tf is ok >>>sf.about()"Slideflow             │                                                       │ Version: 2.0.3-post1  │                                                       │ Backend: tensorflow   │                                                       │ Slide Backend: cucim  │                                                       │ https://slideflow.dev I got a new err when trainning using "resnet50":ValueError: When setting `include_top=True` and loading `imagenet` weights, `input_shape` should be (224, 224, 3).  Received: input_shape=(256, 256, 3)Since my patches are 256x256, do i neet to reget the patches?  This  Error was not in torch Backend.

---- Replied Message ----

     From 

        James ***@***.***>

     Date 

    6/8/2023 23:24

     To 

        ***@***.***>

     Cc 

        ***@***.***>
        ,

        ***@***.***>

     Subject 

          Re: [jamesdolezal/slideflow] AUC =0.5 (Issue #293)

Thanks for providing this. It looks like you're using the PyTorch backend. I would recommend giving the Tensorflow backend a try, as we've recently identified an issue where training models with PyTorch can result in poor performance with the combination of small datasets sizes and small batch sizes (see: 662350a). As you're using a small dataset (around 10 slides per outcome) and a small batch size (batch_size=32), you might be affected. We've written a patch that should be released by Friday. In the meantime, try running the models using Tensorflow. You can do this by first installing Tensorflow and its dependencies: pip install slideflow[tf]

Then, in the terminal you're running your code in, set the environmental variable SF_BACKEND=tensorflow: export SF_BACKEND=tensorflow

Finally, you can add the following line to the top of your script, to confirm that the models is in fact being trained in the tensorflow backend: import slideflow as sf sf.about()

Let me know if that works, otherwise we can troubleshoot further.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

jamesdolezal commented 1 year ago

You need to set the hyperparameter include_top=False if the tile size (in your case, 256) does not match the imagenet pretrained size (224).

hp = sf.ModelParams(..., include_top=False)
ltzg commented 1 year ago
    I got an AUC=0.62 with these hp:hp = sf.ModelParams(    tile_px=256,    tile_um=256,    model=&apos;resnet50&apos;,    batch_size=64,    epochs=[30],    include_top=False)How to perform multi_gpu in tf ? sience i can not find the "multi_gpu" in tf.[07:39:28 AM] INFO     Beginning training                                       Epoch 1/30345/345 [==============================] - ETA: 0s - loss: 0.7167 - accuracy: 0.8530Epoch 1: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 107s 235ms/step - loss: 0.7167 - accuracy: 0.8530Epoch 2/30345/345 [==============================] - ETA: 0s - loss: 0.2586 - accuracy: 0.9291Epoch 2: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 59s 172ms/step - loss: 0.2586 - accuracy: 0.9291Epoch 3/30345/345 [==============================] - ETA: 0s - loss: 0.1620 - accuracy: 0.9518Epoch 3: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.1620 - accuracy: 0.9518Epoch 4/30345/345 [==============================] - ETA: 0s - loss: 0.1184 - accuracy: 0.9611Epoch 4: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.1184 - accuracy: 0.9611Epoch 5/30345/345 [==============================] - ETA: 0s - loss: 0.0965 - accuracy: 0.9683Epoch 5: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0965 - accuracy: 0.9683Epoch 6/30345/345 [==============================] - ETA: 0s - loss: 0.0801 - accuracy: 0.9738Epoch 6: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.0801 - accuracy: 0.9738Epoch 7/30345/345 [==============================] - ETA: 0s - loss: 0.0619 - accuracy: 0.9789Epoch 7: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.0619 - accuracy: 0.9789Epoch 8/30345/345 [==============================] - ETA: 0s - loss: 0.0653 - accuracy: 0.9786Epoch 8: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 59s 172ms/step - loss: 0.0653 - accuracy: 0.9786Epoch 9/30345/345 [==============================] - ETA: 0s - loss: 0.0629 - accuracy: 0.9823Epoch 9: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0629 - accuracy: 0.9823Epoch 10/30345/345 [==============================] - ETA: 0s - loss: 0.0501 - accuracy: 0.9818Epoch 10: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0501 - accuracy: 0.9818Epoch 11/30345/345 [==============================] - ETA: 0s - loss: 0.0551 - accuracy: 0.9824Epoch 11: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 175ms/step - loss: 0.0551 - accuracy: 0.9824Epoch 12/30345/345 [==============================] - ETA: 0s - loss: 0.0475 - accuracy: 0.9834Epoch 12: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0475 - accuracy: 0.9834Epoch 13/30345/345 [==============================] - ETA: 0s - loss: 0.0366 - accuracy: 0.9872Epoch 13: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 176ms/step - loss: 0.0366 - accuracy: 0.9872Epoch 14/30345/345 [==============================] - ETA: 0s - loss: 0.0461 - accuracy: 0.9846Epoch 14: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0461 - accuracy: 0.9846Epoch 15/30345/345 [==============================] - ETA: 0s - loss: 0.0500 - accuracy: 0.9833Epoch 15: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0500 - accuracy: 0.9833Epoch 16/30345/345 [==============================] - ETA: 0s - loss: 0.0343 - accuracy: 0.9872Epoch 16: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.0343 - accuracy: 0.9872Epoch 17/30345/345 [==============================] - ETA: 0s - loss: 0.0358 - accuracy: 0.9883Epoch 17: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0358 - accuracy: 0.9883Epoch 18/30345/345 [==============================] - ETA: 0s - loss: 0.0312 - accuracy: 0.9892Epoch 18: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0312 - accuracy: 0.9892Epoch 19/30345/345 [==============================] - ETA: 0s - loss: 0.0335 - accuracy: 0.9881Epoch 19: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0335 - accuracy: 0.9881Epoch 20/30345/345 [==============================] - ETA: 0s - loss: 0.0411 - accuracy: 0.9856Epoch 20: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0411 - accuracy: 0.9856Epoch 21/30345/345 [==============================] - ETA: 0s - loss: 0.0267 - accuracy: 0.9907Epoch 21: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 176ms/step - loss: 0.0267 - accuracy: 0.9907Epoch 22/30345/345 [==============================] - ETA: 0s - loss: 0.0280 - accuracy: 0.9907Epoch 22: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0280 - accuracy: 0.9907Epoch 23/30345/345 [==============================] - ETA: 0s - loss: 0.0256 - accuracy: 0.9912Epoch 23: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0256 - accuracy: 0.9912Epoch 24/30345/345 [==============================] - ETA: 0s - loss: 0.0384 - accuracy: 0.9875Epoch 24: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0384 - accuracy: 0.9875Epoch 25/30345/345 [==============================] - ETA: 0s - loss: 0.0291 - accuracy: 0.9896Epoch 25: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0291 - accuracy: 0.9896Epoch 26/30345/345 [==============================] - ETA: 0s - loss: 0.0204 - accuracy: 0.9930Epoch 26: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0204 - accuracy: 0.9930Epoch 27/30345/345 [==============================] - ETA: 0s - loss: 0.0218 - accuracy: 0.9922Epoch 27: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 176ms/step - loss: 0.0218 - accuracy: 0.9922Epoch 28/30345/345 [==============================] - ETA: 0s - loss: 0.0254 - accuracy: 0.9914Epoch 28: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0254 - accuracy: 0.9914Epoch 29/30345/345 [==============================] - ETA: 0s - loss: 0.0302 - accuracy: 0.9890Epoch 29: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0302 - accuracy: 0.9890Epoch 30/30WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 53). These functions will not be directly callable after loading.[08:10:54 AM] INFO     Trained model saved to                                                          ./models/00007-category-HP0-kfold2/category-HP0-kfold2_ep                       och30                                                    ⠹ Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 100% 0:00:01 0:00:12 1135 img/s[08:11:06 AM] INFO     Validation metrics for outcome category:                 [08:11:07 AM] INFO     tile-level AUC (cat # 0): 0.461 AP: 0.353 (opt.                                 threshold: 0.006)                                                      INFO     tile-level AUC (cat # 1): 0.461 AP: 0.630 (opt.                                 threshold: 0.994)                                                      INFO     Category 0 acc: 73.4% (4006/5455)                                      INFO     Category 1 acc: 27.9% (2331/8369)                                      INFO     Validation metrics for outcome category:                               INFO     slide-level AUC (cat # 0): 0.395 AP: 0.443 (opt.                                threshold: 0.373)                                        [08:11:08 AM] INFO     slide-level AUC (cat # 1): 0.395 AP: 0.610 (opt.                                threshold: 0.817)                                                      INFO     Category 0 acc: 77.8% (7/9)                                            INFO     Category 1 acc: 33.3% (3/9)                                            INFO     Validation metrics for outcome category:                               INFO     patient-level AUC (cat # 0): 0.395 AP: 0.443 (opt.                              threshold: 0.373)                                                      INFO     patient-level AUC (cat # 1): 0.395 AP: 0.610 (opt.                              threshold: 0.817)                                                      INFO     Category 0 acc: 77.8% (7/9)                                            INFO     Category 1 acc: 33.3% (3/9)                                            INFO     Validation metrics: {                                                               "accuracy": 0.4584056712962963,                                                 "loss": 4.948972702026369                                                   }                                                        Epoch 30: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 112s 326ms/step - loss: 0.0212 - accuracy: 0.9922[08:11:10 AM] INFO     Training results saved: ./results_log.csv                              INFO     Training complete; validation accuracies:                              INFO     category-HP0-kfold1 training metrics:                                  INFO     loss: 0.01900590769946575                                              INFO     accuracy: 0.9949363470077515                                           INFO     category-HP0-kfold1 validation metrics:                                INFO     accuracy: 0.45919384057971013                                          INFO     loss: 4.310663974112357                                                INFO     category-HP0-kfold2 training metrics:                                  INFO     loss: 0.02115924283862114                                              INFO     accuracy: 0.9922101497650146                                           INFO     category-HP0-kfold2 validation metrics:                                INFO     accuracy: 0.4584056712962963                                           INFO     loss: 4.948972702026369                                  {'category-HP0-kfold1': {'epochs': {'epoch30': {'train_metrics': {'loss': 0.01900590769946575, 'accuracy': 0.9949363470077515}, 'val_metrics': {'accuracy': 0.45919384057971013, 'loss': 4.310663974112357}, 'tile_auc': {'category': [0.499942710209614, 0.4960754289960915]}, 'slide_auc': {'category': [0.62, 0.62]}, 'patient_auc': {'category': [0.62, 0.62]}, 'tile_ap': {'category': [0.6317464162325399, 0.3965908783629368]}, 'slide_ap': {'category': [0.6318867243867243, 0.7297976971428983]}, 'patient_ap': {'category': [0.6318867243867243, 0.7297976971428983]}}}}, 'category-HP0-kfold2': {'epochs': {'epoch30': {'train_metrics': {'loss': 0.02115924283862114, 'accuracy': 0.9922101497650146}, 'val_metrics': {'accuracy': 0.4584056712962963, 'loss': 4.948972702026369}, 'tile_auc': {'category': [0.46063974694266374, 0.4610330626349106]}, 'slide_auc': {'category': [0.3950617283950617, 0.39506172839506176]}, 'patient_auc': {'category': [0.3950617283950617, 0.39506172839506176]}, 'tile_ap': {'category': [0.3526021055659684, 0.6303825560205547]}, 'slide_ap': {'category': [0.4432966724633391, 0.6096288515406163]}, 'patient_ap': {'category': [0.4432966724633391, 0.6096288515406163]}}}}}

---- Replied Message ----

     From 

        James ***@***.***>

     Date 

    6/9/2023 02:14

     To 

        ***@***.***>

     Cc 

        ***@***.***>
        ,

        ***@***.***>

     Subject 

          Re: [jamesdolezal/slideflow] AUC =0.5 (Issue #293)

You need to set the hyper parameter include_top=False if the tile size (in your case, 256) does not match the imagenet pertained size (224). hp = sf.ModelParams(..., include_top=False)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

jamesdolezal commented 1 year ago

Glad to see the performance has improved! AUC of 0.62 is probably in line with what you might expect for a discretized survival model with only ~20 slides.

You can enable distributed training across multiple gpus with the argument multi_gpu=True. Eg:

P.train(..., multi_gpu=True)

As the original issue has been addressed, I'm going to close this issue. If you encounter any further problems, please don't hesitate to open another issue.