Closed ltzg closed 1 year ago
Hi Itzg,
There are several things that can cause a model to fail to converge. This includes technical problems (e.g. issues with data or labels), suboptimal parameters (e.g. tile size, model architecture, hyperparameters), insufficient data (too few slides), or biological issues (e.g., the outcome can't be accurately predicted because there isn't sufficient true biological signal).
When troubleshooting models, the first place I like to start is by inspecting the dataset. Review the TFRecord tile extraction report and make sure that the data looks reasonable (tiles taken from inside areas of tumor). Then, use Dataset.summary()
to quickly inspect the dataset to make sure that the TFRecords are valid and you have enough data for each outcome (0 and 1).
Can you paste the results of the following code, which will help me better understand the structure of your data?
import slideflow as sf
P = sf.load_project('.')
dataset = P.dataset(tile_px=256, tile_um=256)
dataset.summary()
Thank you very much. These are the return of "dataset.summary()"Overview:╒═════════════════════╤═════════════════╕│ Configuration file: │ ./datasets.json ││ Tile size (px): │ 256 ││ Tile size (um): │ 256 ││ Slides: │ 38 ││ Patients: │ 38 ││ Slides with ROIs: │ 38 ││ Patients with ROIs: │ 38 │╘═════════════════════╧═════════════════╛Filters:╒═══════════════╤════╕│ Filters: │ {} │├───────────────┼────┤│ Filter Blank: │ [] │├───────────────┼────┤│ Min Tiles: │ 0 │╘═══════════════╧════╛Sources:MyProject╒═══════════╤════════════════╕│ slides │ ./slides2/ ││ roi │ ./slides2/rois ││ tiles │ ./tiles ││ tfrecords │ ./tfrecords ││ label │ 256px_256um │╘═══════════╧════════════════╛Number of tiles in TFRecords: 35921Annotation columns:Index(['patient', 'dataset', 'category', 'slide', 'AWG_MLH1_silencing', 'AWG_cancer_type_Oct62011', 'CDE_ID_3226963', 'CIMP', 'MSI_updated_Oct62011', 'X_INTEGRATION', ... 'X_PATIENT.y', 'OS', 'OS.time', 'DSS', 'DSS.time', 'DFI', 'DFI.time', 'PFI', 'PFI.time', 'slide.1'], dtype='object', length=137) It should be pointed out that the original task 1 had 38 people and 0 had 98 people, and the predicted result was also AUC=0.5. Considering that there may be no ROI, I annotate the ROI in a small range. The following are the results of one training and validation session[09:39:45 AM] INFO Training model category-HP0 (k-fold #1)... INFO Hyperparameters: { "augment": "xyrj", "batch_size": 32, "drop_images": false, "dropout": 0, "early_stop": false, "early_stop_method": "loss", "early_stop_patience": 0, "epochs": [ 2 ], "hidden_layer_width": 500, "hidden_layers": 0, "include_top": true, "l1": 0.0, "l1_dense": 0.0, "l2": 0.0, "l2_dense": 0.0, "learning_rate": 0.0001, "learning_rate_decay": 0, "learning_rate_decay_steps": 100000, "loss": "CrossEntropy", "manual_early_stop_batch": null, "manual_early_stop_epoch": null, "model": "xception", "normalizer": null, "normalizer_source": null, "optimizer": "Adam", "pooling": "max", "tile_px": 256, "tile_um": 256, "toplayer_epochs": 0, "trainable_layers": 0, "training_balance": "category", "uq": false, "validation_balance": "none" } INFO Val settings: { "strategy": "k-fold", "k_fold": 2, "k": null, "k_fold_header": null, "fraction": null, "source": null, "annotations": null, "filters": null, "dataset": null } INFO No compatible train/val split found. INFO Logging new split at ./splits.json INFO Category 0 1 INFO K-fold-0 10 10 INFO K-fold-1 9 9 INFO Using 18 training TFRecords, 20 validation INFO Steps per epoch = 432 [09:39:48 AM] INFO Using pretraining: imagenet ModelWrapper Parameters Buffers Output shape Datatype--- --- --- --- --- model.conv1 864 - [32, 32, 127, 127] float32 model.bn1 64 65 [32, 32, 127, 127] float32 model.conv2 18432 - [32, 64, 125, 125] float32 model.bn2 128 129 [32, 64, 125, 125] float32 model.block1.rep 26816 514 [32, 128, 63, 63] float32 model.block1.skip 8192 - [32, 128, 63, 63] float32 model.block1.skipbn 256 257 [32, 128, 63, 63] float32 model.block2.rep 102784 1026 [32, 256, 32, 32] float32 model.block2.skip 32768 - [32, 256, 32, 32] float32 model.block2.skipbn 512 513 [32, 256, 32, 32] float32 model.block3.rep 728120 2914 [32, 728, 16, 16] float32 model.block3.skip 186368 - [32, 728, 16, 16] float32 model.block3.skipbn 1456 1457 [32, 728, 16, 16] float32 model.block4.rep 1613976 4371 [32, 728, 16, 16] float32 model.block5.rep 1613976 4371 [32, 728, 16, 16] float32 model.block6.rep 1613976 4371 [32, 728, 16, 16] float32 model.block7.rep 1613976 4371 [32, 728, 16, 16] float32 model.block8.rep 1613976 4371 [32, 728, 16, 16] float32 model.block9.rep 1613976 4371 [32, 728, 16, 16] float32 model.block10.rep 1613976 4371 [32, 728, 16, 16] float32 model.block11.rep 1613976 4371 [32, 728, 16, 16] float32 model.block12.rep 1292064 3506 [32, 1024, 8, 8] float32 model.block12.skip 745472 - [32, 1024, 8, 8] float32 model.block12.skipbn 2048 2049 [32, 1024, 8, 8] float32 model.conv3.conv1 9216 - [32, 1024, 8, 8] float32 model.conv3.pointwise 1572864 - [32, 1536, 8, 8] float32 model.bn3 3072 3073 [32, 1536, 8, 8] float32 model.conv4.conv1 13824 - [32, 1536, 8, 8] float32 model.conv4.pointwise 3145728 - [32, 2048, 8, 8] float32 model.bn4 4096 4097 [32, 2048, 8, 8] float32 model.last_linear - - [32, 2048] float32 fc0 4098 - [32, 2] float32 --- --- --- --- --- Total 20811050 54568 - - [09:39:55 AM] INFO Epoch 1/2 train loss: 0.4491 acc: 0.7879 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:44 312 img/s[09:40:39 AM] INFO train Epoch 1 | loss: 0.4491 acc: 0.7879 INFO Epoch 2/2 train loss: 0.2796 acc: 0.8937 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:42 325 img/s[09:41:22 AM] INFO train Epoch 2 | loss: 0.2796 acc: 0.8937 INFO Model saved to ./models/00000-category-HP0-kfold1/category-HP0-kfold1_ep och2.zip Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:17 1253 img/s[09:41:41 AM] INFO Validation metrics for outcome category: INFO tile-level AUC (cat # 0): 0.500 AP: 0.640 (opt. threshold: 1.675) INFO tile-level AUC (cat # 1): 0.500 AP: 0.360 (opt. threshold: 1.325) INFO Category 0 acc: 100.0% (14142/14142) INFO Category 1 acc: 0.0% (0/7947) INFO Validation metrics for outcome category: [09:41:42 AM] INFO slide-level AUC (cat # 0): 0.500 AP: 0.500 (opt. threshold: 1.675) INFO slide-level AUC (cat # 1): 0.500 AP: 0.500 (opt. threshold: 1.325) INFO Category 0 acc: 100.0% (10/10) INFO Category 1 acc: 0.0% (0/10) INFO Validation metrics for outcome category: INFO patient-level AUC (cat # 0): 0.500 AP: 0.500 (opt. threshold: 1.675) INFO patient-level AUC (cat # 1): 0.500 AP: 0.500 (opt. threshold: 1.325) INFO Category 0 acc: 100.0% (10/10) INFO Category 1 acc: 0.0% (0/10) INFO val Epoch 2 | loss: 0.6558 acc: 0.6402 [09:41:43 AM] INFO Training model category-HP0 (k-fold #2)... INFO Hyperparameters: { "augment": "xyrj", "batch_size": 32, "drop_images": false, "dropout": 0, "early_stop": false, "early_stop_method": "loss", "early_stop_patience": 0, "epochs": [ 2 ], "hidden_layer_width": 500, "hidden_layers": 0, "include_top": true, "l1": 0.0, "l1_dense": 0.0, "l2": 0.0, "l2_dense": 0.0, "learning_rate": 0.0001, "learning_rate_decay": 0, "learning_rate_decay_steps": 100000, "loss": "CrossEntropy", "manual_early_stop_batch": null, "manual_early_stop_epoch": null, "model": "xception", "normalizer": null, "normalizer_source": null, "optimizer": "Adam", "pooling": "max", "tile_px": 256, "tile_um": 256, "toplayer_epochs": 0, "trainable_layers": 0, "training_balance": "category", "uq": false, "validation_balance": "none" } INFO Val settings: { "strategy": "k-fold", "k_fold": 2, "k": null, "k_fold_header": null, "fraction": null, "source": null, "annotations": null, "filters": null, "dataset": null } INFO Using k-fold validation split detected at ./splits.json (ID: 0) INFO Using 20 training TFRecords, 18 validation INFO Steps per epoch = 690 INFO Using pretraining: imagenet ModelWrapper Parameters Buffers Output shape Datatype--- --- --- --- --- model.conv1 864 - [32, 32, 127, 127] float32 model.bn1 64 65 [32, 32, 127, 127] float32 model.conv2 18432 - [32, 64, 125, 125] float32 model.bn2 128 129 [32, 64, 125, 125] float32 model.block1.rep 26816 514 [32, 128, 63, 63] float32 model.block1.skip 8192 - [32, 128, 63, 63] float32 model.block1.skipbn 256 257 [32, 128, 63, 63] float32 model.block2.rep 102784 1026 [32, 256, 32, 32] float32 model.block2.skip 32768 - [32, 256, 32, 32] float32 model.block2.skipbn 512 513 [32, 256, 32, 32] float32 model.block3.rep 728120 2914 [32, 728, 16, 16] float32 model.block3.skip 186368 - [32, 728, 16, 16] float32 model.block3.skipbn 1456 1457 [32, 728, 16, 16] float32 model.block4.rep 1613976 4371 [32, 728, 16, 16] float32 model.block5.rep 1613976 4371 [32, 728, 16, 16] float32 model.block6.rep 1613976 4371 [32, 728, 16, 16] float32 model.block7.rep 1613976 4371 [32, 728, 16, 16] float32 model.block8.rep 1613976 4371 [32, 728, 16, 16] float32 model.block9.rep 1613976 4371 [32, 728, 16, 16] float32 model.block10.rep 1613976 4371 [32, 728, 16, 16] float32 model.block11.rep 1613976 4371 [32, 728, 16, 16] float32 model.block12.rep 1292064 3506 [32, 1024, 8, 8] float32 model.block12.skip 745472 - [32, 1024, 8, 8] float32 model.block12.skipbn 2048 2049 [32, 1024, 8, 8] float32 model.conv3.conv1 9216 - [32, 1024, 8, 8] float32 model.conv3.pointwise 1572864 - [32, 1536, 8, 8] float32 model.bn3 3072 3073 [32, 1536, 8, 8] float32 model.conv4.conv1 13824 - [32, 1536, 8, 8] float32 model.conv4.pointwise 3145728 - [32, 2048, 8, 8] float32 model.bn4 4096 4097 [32, 2048, 8, 8] float32 model.last_linear - - [32, 2048] float32 fc0 4098 - [32, 2] float32 --- --- --- --- --- Total 20811050 54568 - - [09:41:48 AM] INFO Epoch 1/2 train loss: 0.4671 acc: 0.7768 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:01:09 317 img/s[09:42:57 AM] INFO train Epoch 1 | loss: 0.4671 acc: 0.7768 INFO Epoch 2/2 train loss: 0.3189 acc: 0.8693 ━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:01:08 321 img/s[09:44:06 AM] INFO train Epoch 2 | loss: 0.3189 acc: 0.8693 INFO Model saved to ./models/00001-category-HP0-kfold2/category-HP0-kfold2_ep och2.zip Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:10 1271 img/s[09:44:18 AM] INFO Validation metrics for outcome category: [09:44:19 AM] INFO tile-level AUC (cat # 0): 0.500 AP: 0.394 (opt. threshold: 1.475) INFO tile-level AUC (cat # 1): 0.500 AP: 0.606 (opt. threshold: 1.525) INFO Category 0 acc: 0.0% (0/5455) INFO Category 1 acc: 100.0% (8377/8377) INFO Validation metrics for outcome category: INFO slide-level AUC (cat # 0): 0.500 AP: 0.500 (opt. threshold: 1.475) INFO slide-level AUC (cat # 1): 0.500 AP: 0.500 (opt. threshold: 1.525) INFO Category 0 acc: 0.0% (0/9) INFO Category 1 acc: 100.0% (9/9) INFO Validation metrics for outcome category: [09:44:20 AM] INFO patient-level AUC (cat # 0): 0.500 AP: 0.500 (opt. threshold: 1.475) INFO patient-level AUC (cat # 1): 0.500 AP: 0.500 (opt. threshold: 1.525) INFO Category 0 acc: 0.0% (0/9) INFO Category 1 acc: 100.0% (9/9) INFO val Epoch 2 | loss: 0.6838 acc: 0.6056 INFO Training results saved: ./results_log.csv INFO Training complete; validation accuracies: INFO category-HP0-kfold1 training metrics: INFO loss: 0.27962709085463927 INFO accuracy: 0.893663227558136 INFO category-HP0-kfold1 validation metrics: INFO loss: 0.6558491226190072 INFO accuracy: 0.6402281678663588 INFO category-HP0-kfold2 training metrics: INFO loss: 0.3189038523891266 INFO accuracy: 0.8692935109138489 INFO category-HP0-kfold2 validation metrics: INFO loss: 0.6838010012448489 INFO accuracy: 0.6056246385193753 {'category-HP0-kfold1': {'epochs': defaultdict(<class 'dict'>, {'epoch1': {'train_metrics': {'loss': 0.44913872703909874, 'accuracy': 0.7879050970077515}}, 'epoch2': {'train_metrics': {'loss': 0.27962709085463927, 'accuracy': 0.893663227558136}, 'val_metrics': {'loss': 0.6558491226190072, 'accuracy': 0.6402281678663588}, 'tile_auc': {'category': [0.5, 0.5]}, 'slide_auc': {'category': [0.5, 0.5]}, 'patient_auc': {'category': [0.5, 0.5]}, 'tile_ap': {'category': [0.6402281678663588, 0.35977183213364117]}, 'slide_ap': {'category': [0.5, 0.5]}, 'patient_ap': {'category': [0.5, 0.5]}}})}, 'category-HP0-kfold2': {'epochs': defaultdict(<class 'dict'>, {'epoch1': {'train_metrics': {'loss': 0.4671213014834169, 'accuracy': 0.7768115997314453}}, 'epoch2': {'train_metrics': {'loss': 0.3189038523891266, 'accuracy': 0.8692935109138489}, 'val_metrics': {'loss': 0.6838010012448489, 'accuracy': 0.6056246385193753}, 'tile_auc': {'category': [0.5, 0.5]}, 'slide_auc': {'category': [0.5, 0.5]}, 'patient_auc': {'category': [0.5, 0.5]}, 'tile_ap': {'category': [0.39437536148062463, 0.6056246385193753]}, 'slide_ap': {'category': [0.5, 0.5]}, 'patient_ap': {'category': [0.5, 0.5]}}})}}
---- Replied Message ----
From
James ***@***.***>
Date
6/8/2023 02:30
To
***@***.***>
Cc
***@***.***>
,
***@***.***>
Subject
Re: [jamesdolezal/slideflow] AUC =0.5 (Issue #293)
Hi Itzg, There are several things that can cause a model to fail to converge. This includes technical problems (e.g. issues with data or labels), suboptimal parameters (e.g. tile size, model architecture, hyperparameters), insufficient data (too few slides), or biological issues (e.g., the outcome can't be accurately predicted because there isn't sufficient true biological signal). When troubleshooting models, the first place I like to start is by inspecting the dataset. Review the TFRecord tile extraction report and make sure that the data looks reasonable (tiles taken from inside areas of tumor). Then, use Dataset.summary() to quickly inspect the dataset to make sure that the TFRecords are valid and you have enough data for each outcome (0 and 1). Can you paste the results of the following code, which will help me better understand the structure of your data? import slideflow as sf
P = sf.load_project('.') dataset = P.dataset(tile_px=256, tile_um=256) dataset.summary()
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Thanks for providing this.
It looks like you're using the PyTorch backend. I would recommend giving the Tensorflow backend a try, as we've recently identified an issue where training models with PyTorch can result in poor performance with the combination of small datasets sizes and small batch sizes (see: 662350a). As you're using a small dataset (around 10 slides per outcome) and a small batch size (batch_size=32), you might be affected. We've written a patch that should be released by Friday.
In the meantime, try running the models using Tensorflow. You can do this by first installing Tensorflow and its dependencies:
pip install slideflow[tf]
Then, in the terminal you're running your code in, set the environmental variable SF_BACKEND=tensorflow
:
export SF_BACKEND=tensorflow
Finally, you can add the following line to the top of your script, to confirm that the models is in fact being trained in the tensorflow backend:
import slideflow as sf
sf.about()
Let me know if that works, otherwise we can troubleshoot further.
I create a new conda env for tf, and the tf is ok >>>sf.about()"Slideflow │ │ Version: 2.0.3-post1 │ │ Backend: tensorflow │ │ Slide Backend: cucim │ │ https://slideflow.dev I got a new err when trainning using "resnet50":ValueError: When setting `include_top=True` and loading `imagenet` weights, `input_shape` should be (224, 224, 3). Received: input_shape=(256, 256, 3)Since my patches are 256x256, do i neet to reget the patches? This Error was not in torch Backend.
---- Replied Message ----
From
James ***@***.***>
Date
6/8/2023 23:24
To
***@***.***>
Cc
***@***.***>
,
***@***.***>
Subject
Re: [jamesdolezal/slideflow] AUC =0.5 (Issue #293)
Thanks for providing this. It looks like you're using the PyTorch backend. I would recommend giving the Tensorflow backend a try, as we've recently identified an issue where training models with PyTorch can result in poor performance with the combination of small datasets sizes and small batch sizes (see: 662350a). As you're using a small dataset (around 10 slides per outcome) and a small batch size (batch_size=32), you might be affected. We've written a patch that should be released by Friday. In the meantime, try running the models using Tensorflow. You can do this by first installing Tensorflow and its dependencies: pip install slideflow[tf]
Then, in the terminal you're running your code in, set the environmental variable SF_BACKEND=tensorflow: export SF_BACKEND=tensorflow
Finally, you can add the following line to the top of your script, to confirm that the models is in fact being trained in the tensorflow backend: import slideflow as sf sf.about()
Let me know if that works, otherwise we can troubleshoot further.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
You need to set the hyperparameter include_top=False
if the tile size (in your case, 256) does not match the imagenet pretrained size (224).
hp = sf.ModelParams(..., include_top=False)
I got an AUC=0.62 with these hp:hp = sf.ModelParams( tile_px=256, tile_um=256, model='resnet50', batch_size=64, epochs=[30], include_top=False)How to perform multi_gpu in tf ? sience i can not find the "multi_gpu" in tf.[07:39:28 AM] INFO Beginning training Epoch 1/30345/345 [==============================] - ETA: 0s - loss: 0.7167 - accuracy: 0.8530Epoch 1: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 107s 235ms/step - loss: 0.7167 - accuracy: 0.8530Epoch 2/30345/345 [==============================] - ETA: 0s - loss: 0.2586 - accuracy: 0.9291Epoch 2: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 59s 172ms/step - loss: 0.2586 - accuracy: 0.9291Epoch 3/30345/345 [==============================] - ETA: 0s - loss: 0.1620 - accuracy: 0.9518Epoch 3: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.1620 - accuracy: 0.9518Epoch 4/30345/345 [==============================] - ETA: 0s - loss: 0.1184 - accuracy: 0.9611Epoch 4: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.1184 - accuracy: 0.9611Epoch 5/30345/345 [==============================] - ETA: 0s - loss: 0.0965 - accuracy: 0.9683Epoch 5: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0965 - accuracy: 0.9683Epoch 6/30345/345 [==============================] - ETA: 0s - loss: 0.0801 - accuracy: 0.9738Epoch 6: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.0801 - accuracy: 0.9738Epoch 7/30345/345 [==============================] - ETA: 0s - loss: 0.0619 - accuracy: 0.9789Epoch 7: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.0619 - accuracy: 0.9789Epoch 8/30345/345 [==============================] - ETA: 0s - loss: 0.0653 - accuracy: 0.9786Epoch 8: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 59s 172ms/step - loss: 0.0653 - accuracy: 0.9786Epoch 9/30345/345 [==============================] - ETA: 0s - loss: 0.0629 - accuracy: 0.9823Epoch 9: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0629 - accuracy: 0.9823Epoch 10/30345/345 [==============================] - ETA: 0s - loss: 0.0501 - accuracy: 0.9818Epoch 10: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0501 - accuracy: 0.9818Epoch 11/30345/345 [==============================] - ETA: 0s - loss: 0.0551 - accuracy: 0.9824Epoch 11: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 175ms/step - loss: 0.0551 - accuracy: 0.9824Epoch 12/30345/345 [==============================] - ETA: 0s - loss: 0.0475 - accuracy: 0.9834Epoch 12: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0475 - accuracy: 0.9834Epoch 13/30345/345 [==============================] - ETA: 0s - loss: 0.0366 - accuracy: 0.9872Epoch 13: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 176ms/step - loss: 0.0366 - accuracy: 0.9872Epoch 14/30345/345 [==============================] - ETA: 0s - loss: 0.0461 - accuracy: 0.9846Epoch 14: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0461 - accuracy: 0.9846Epoch 15/30345/345 [==============================] - ETA: 0s - loss: 0.0500 - accuracy: 0.9833Epoch 15: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0500 - accuracy: 0.9833Epoch 16/30345/345 [==============================] - ETA: 0s - loss: 0.0343 - accuracy: 0.9872Epoch 16: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 173ms/step - loss: 0.0343 - accuracy: 0.9872Epoch 17/30345/345 [==============================] - ETA: 0s - loss: 0.0358 - accuracy: 0.9883Epoch 17: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0358 - accuracy: 0.9883Epoch 18/30345/345 [==============================] - ETA: 0s - loss: 0.0312 - accuracy: 0.9892Epoch 18: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0312 - accuracy: 0.9892Epoch 19/30345/345 [==============================] - ETA: 0s - loss: 0.0335 - accuracy: 0.9881Epoch 19: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0335 - accuracy: 0.9881Epoch 20/30345/345 [==============================] - ETA: 0s - loss: 0.0411 - accuracy: 0.9856Epoch 20: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0411 - accuracy: 0.9856Epoch 21/30345/345 [==============================] - ETA: 0s - loss: 0.0267 - accuracy: 0.9907Epoch 21: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 176ms/step - loss: 0.0267 - accuracy: 0.9907Epoch 22/30345/345 [==============================] - ETA: 0s - loss: 0.0280 - accuracy: 0.9907Epoch 22: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0280 - accuracy: 0.9907Epoch 23/30345/345 [==============================] - ETA: 0s - loss: 0.0256 - accuracy: 0.9912Epoch 23: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0256 - accuracy: 0.9912Epoch 24/30345/345 [==============================] - ETA: 0s - loss: 0.0384 - accuracy: 0.9875Epoch 24: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0384 - accuracy: 0.9875Epoch 25/30345/345 [==============================] - ETA: 0s - loss: 0.0291 - accuracy: 0.9896Epoch 25: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 174ms/step - loss: 0.0291 - accuracy: 0.9896Epoch 26/30345/345 [==============================] - ETA: 0s - loss: 0.0204 - accuracy: 0.9930Epoch 26: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0204 - accuracy: 0.9930Epoch 27/30345/345 [==============================] - ETA: 0s - loss: 0.0218 - accuracy: 0.9922Epoch 27: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 61s 176ms/step - loss: 0.0218 - accuracy: 0.9922Epoch 28/30345/345 [==============================] - ETA: 0s - loss: 0.0254 - accuracy: 0.9914Epoch 28: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0254 - accuracy: 0.9914Epoch 29/30345/345 [==============================] - ETA: 0s - loss: 0.0302 - accuracy: 0.9890Epoch 29: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 60s 175ms/step - loss: 0.0302 - accuracy: 0.9890Epoch 30/30WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 53). These functions will not be directly callable after loading.[08:10:54 AM] INFO Trained model saved to ./models/00007-category-HP0-kfold2/category-HP0-kfold2_ep och30 ⠹ Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 100% 0:00:01 0:00:12 1135 img/s[08:11:06 AM] INFO Validation metrics for outcome category: [08:11:07 AM] INFO tile-level AUC (cat # 0): 0.461 AP: 0.353 (opt. threshold: 0.006) INFO tile-level AUC (cat # 1): 0.461 AP: 0.630 (opt. threshold: 0.994) INFO Category 0 acc: 73.4% (4006/5455) INFO Category 1 acc: 27.9% (2331/8369) INFO Validation metrics for outcome category: INFO slide-level AUC (cat # 0): 0.395 AP: 0.443 (opt. threshold: 0.373) [08:11:08 AM] INFO slide-level AUC (cat # 1): 0.395 AP: 0.610 (opt. threshold: 0.817) INFO Category 0 acc: 77.8% (7/9) INFO Category 1 acc: 33.3% (3/9) INFO Validation metrics for outcome category: INFO patient-level AUC (cat # 0): 0.395 AP: 0.443 (opt. threshold: 0.373) INFO patient-level AUC (cat # 1): 0.395 AP: 0.610 (opt. threshold: 0.817) INFO Category 0 acc: 77.8% (7/9) INFO Category 1 acc: 33.3% (3/9) INFO Validation metrics: { "accuracy": 0.4584056712962963, "loss": 4.948972702026369 } Epoch 30: saving model to ./models/00007-category-HP0-kfold2/cp.ckpt345/345 [==============================] - 112s 326ms/step - loss: 0.0212 - accuracy: 0.9922[08:11:10 AM] INFO Training results saved: ./results_log.csv INFO Training complete; validation accuracies: INFO category-HP0-kfold1 training metrics: INFO loss: 0.01900590769946575 INFO accuracy: 0.9949363470077515 INFO category-HP0-kfold1 validation metrics: INFO accuracy: 0.45919384057971013 INFO loss: 4.310663974112357 INFO category-HP0-kfold2 training metrics: INFO loss: 0.02115924283862114 INFO accuracy: 0.9922101497650146 INFO category-HP0-kfold2 validation metrics: INFO accuracy: 0.4584056712962963 INFO loss: 4.948972702026369 {'category-HP0-kfold1': {'epochs': {'epoch30': {'train_metrics': {'loss': 0.01900590769946575, 'accuracy': 0.9949363470077515}, 'val_metrics': {'accuracy': 0.45919384057971013, 'loss': 4.310663974112357}, 'tile_auc': {'category': [0.499942710209614, 0.4960754289960915]}, 'slide_auc': {'category': [0.62, 0.62]}, 'patient_auc': {'category': [0.62, 0.62]}, 'tile_ap': {'category': [0.6317464162325399, 0.3965908783629368]}, 'slide_ap': {'category': [0.6318867243867243, 0.7297976971428983]}, 'patient_ap': {'category': [0.6318867243867243, 0.7297976971428983]}}}}, 'category-HP0-kfold2': {'epochs': {'epoch30': {'train_metrics': {'loss': 0.02115924283862114, 'accuracy': 0.9922101497650146}, 'val_metrics': {'accuracy': 0.4584056712962963, 'loss': 4.948972702026369}, 'tile_auc': {'category': [0.46063974694266374, 0.4610330626349106]}, 'slide_auc': {'category': [0.3950617283950617, 0.39506172839506176]}, 'patient_auc': {'category': [0.3950617283950617, 0.39506172839506176]}, 'tile_ap': {'category': [0.3526021055659684, 0.6303825560205547]}, 'slide_ap': {'category': [0.4432966724633391, 0.6096288515406163]}, 'patient_ap': {'category': [0.4432966724633391, 0.6096288515406163]}}}}}
---- Replied Message ----
From
James ***@***.***>
Date
6/9/2023 02:14
To
***@***.***>
Cc
***@***.***>
,
***@***.***>
Subject
Re: [jamesdolezal/slideflow] AUC =0.5 (Issue #293)
You need to set the hyper parameter include_top=False if the tile size (in your case, 256) does not match the imagenet pertained size (224). hp = sf.ModelParams(..., include_top=False)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Glad to see the performance has improved! AUC of 0.62 is probably in line with what you might expect for a discretized survival model with only ~20 slides.
You can enable distributed training across multiple gpus with the argument multi_gpu=True
. Eg:
P.train(..., multi_gpu=True)
As the original issue has been addressed, I'm going to close this issue. If you encounter any further problems, please don't hesitate to open another issue.
Dear Author,
I used the TCGA colorectal cancer WSI file to establish a 3-year recurrence prediction model (0 and 1) for colorectal cancer patients after surgery. I delineated the ROI of the WSI and homogenized the patches. I tried using different CNN networks (densenet, Restnet) to build the model, but the AUC was always 0.5. It seems that all the patches are classified as 0 or 1. Is there a better adjustment suggestion? Here is my code. ####################### import slideflow as sf
P = sf.create_project(
root='.',
annotations="./RFS1.csv",
slides='./slides1/',
)
P.extract_tiles(tile_px=256, tile_um=256)
P = sf.load_project('.')###每次退出都要重新加载项目文件
from slideflow.model import build_feature_extractor
P = sf.load_project('.')###每次退出都要重新加载项目文件
dataset = P.dataset(tile_px=256, tileum=256)
labels, = dataset.labels('category')
train_dts, val_dts = dataset.split(
model_type='categorical',
labels=labels,
val_strategy='k-fold',
val_k_fold=2,
k_fold_iter=1
)
hp = sf.ModelParams(
tile_px=256,
tile_um=256,
model='densenet',
)
Train with 5-fold cross-validation
P.train(
'category',
params=hp,
val_k_fold=2
)