lassoan / SlicerMONAIAuto3DSeg

Extension for 3D Slicer for running MONAI Auto3DSeg models
MIT License
66 stars 10 forks source link

Runtime error trying to run TS1 model #56

Open fedorov opened 7 months ago

fedorov commented 7 months ago

I tried to run the TS1 model using the extension from Slicer 5.7.0-2024-04-24 r32828 / 4a60ea6 on the CT series from the NLST collection and ran into the error below. Is this expected? What are the RAM requirements for this model?

Downloading model: 306.4MB / 309.0MB (99.2%)
Download finished. Extracting to /home/exouser/.MONAIAuto3DSeg/models/whole-body-v1.0.0...
Cleaning up temporary model download folder...
Processing started
Writing input file to /tmp/Slicer-exouser/__SlicerTemp__2024-04-26_21+59+07.664/input-volume0.nrrd
Creating segmentations with MONAIAuto3DSeg AI...
Auto3DSeg command: ['/home/exouser/Desktop/Slicer-5.7.0-2024-04-24-linux-amd64/bin/../bin/PythonSlicer', '/home/exouser/Desktop/Slicer-5.7.0-2024-04-24-linux-amd64/slicer.org/Extensions-32828/MONAIAuto3DSeg/lib/Slicer-5.7/qt-scripted-modules/Scripts/auto3dseg_segresnet_inference.py', '--model-file', '/home/exouser/.MONAIAuto3DSeg/models/whole-body-v1.0.0/model.pt', '--image-file', '/tmp/Slicer-exouser/__SlicerTemp__2024-04-26_21+59+07.664/input-volume0.nrrd', '--result-file', '/tmp/Slicer-exouser/__SlicerTemp__2024-04-26_21+59+07.664/output-segmentation.nrrd']
`apex.normalization.InstanceNorm3dNVFuser` is not installed properly, use nn.InstanceNorm3d instead.
Model epoch 492 metric 0.7895059585571289
Using crop_foreground
Using resample with  resample_resolution [1.5, 1.5, 1.5]
Running Inference ...

  0%|          | 0/12 [00:00<?, ?it/s]Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:84.)
Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)

  0%|          | 0/12 [00:00<?, ?it/s]
2024-04-26 21:59:14,440 - INFO - CUDA out of memory. Tried to allocate 5.02 GiB. GPU
2024-04-26 21:59:14,440 - WARNING - GPU stitching failed, buffer 1 dim -1, image dim torch.Size([1, 1, 254, 243, 208]).

0it [00:00, ?it/s]
0it [00:00, ?it/s]
2024-04-26 21:59:15,049 - INFO - CUDA out of memory. Tried to allocate 2.83 GiB. GPU
2024-04-26 21:59:15,049 - WARNING - GPU buffered stitching failed, attempting on CPU, image dim torch.Size([1, 1, 254, 243, 208]).

Processing failed with return code 1
Cleaning up temporary folder.
Processing failed after 11.65 seconds.

Processing finished.
image

You can access the same CT series that I used by installing SlicerIDCBrowser extension, and plugging this 1.2.840.113654.2.55.252662084823127974216855931259749568803 into the SeriesInstanceUID of the downloader section of the UI.

image
diazandr3s commented 7 months ago

Hi @fedorov,

Thanks for your feedback.

I've downloaded the sample and performed inference. Here is the result: https://github.com/lassoan/SlicerMONAIAuto3DSeg/assets/11991079/30065149-5775-4b6f-9b20-26ef9c7c5156

I see the volume you're referring to is 512x512x156 with a spacing of 0.7x0.7x2mm - After resampling to 1.5mm, it becomes: 254x 243x208

It took around 25 seconds and 17GB of GPU memory to run. Which spec does your machine have? Can you please try on the CPU to check if it runs on your end?

BTW, this is the very first version of the whole-body CT segmentation model on TSV1. I'm working on getting the second version - potentially more accurate :)

Thanks!

fedorov commented 7 months ago

Here are the specs

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID A100X-8C                  On  | 00000000:04:00.0 Off |                    0 |
| N/A   N/A    P0              N/A /  N/A |      1MiB /  8192MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Some other models I tried on that same volume worked without errors. Below is "Whole body segmentation - TS1 quick".

2024-04-28_22-36-19

I tried on CPU, and it errored as well:

Processing started
Writing input file to /tmp/Slicer-exouser/__SlicerTemp__2024-04-29_02+33+26.814/input-volume0.nrrd
Creating segmentations with MONAIAuto3DSeg AI...
Auto3DSeg command: ['/home/exouser/Desktop/Slicer-5.7.0-2024-04-24-linux-amd64/bin/../bin/PythonSlicer', '/home/exouser/Desktop/Slicer-5.7.0-2024-04-24-linux-amd64/slicer.org/Extensions-32828/MONAIAuto3DSeg/lib/Slicer-5.7/qt-scripted-modules/Scripts/auto3dseg_segresnet_inference.py', '--model-file', '/home/exouser/.MONAIAuto3DSeg/models/whole-body-v1.0.0/model.pt', '--image-file', '/tmp/Slicer-exouser/__SlicerTemp__2024-04-29_02+33+26.814/input-volume0.nrrd', '--result-file', '/tmp/Slicer-exouser/__SlicerTemp__2024-04-29_02+33+26.814/output-segmentation.nrrd']
Additional environment variables: {'CUDA_VISIBLE_DEVICES': '-1'}
`apex.normalization.InstanceNorm3dNVFuser` is not installed properly, use nn.InstanceNorm3d instead.
User provided device_type of 'cuda', but CUDA is not available. Disabling
Model epoch 492 metric 0.7895059585571289
Using crop_foreground
Using resample with  resample_resolution [1.5, 1.5, 1.5]
Running Inference ...

  0%|          | 0/12 [00:00<?, ?it/s]
Processing failed with return code 1
Cleaning up temporary folder.
Processing failed after 42.12 seconds.

Processing finished.

It is not in my critical path - I am just reporting in case this helps with your development. No urgency!

diazandr3s commented 7 months ago

Thanks again for the feedback, @fedorov. I'm glad to see the quick version worked on your end. Strangely, the high-resolution model didn't work on the CPU. How much RAM does this instance have?

fedorov commented 7 months ago

CPU has 15 GB.

lassoan commented 7 months ago

For a full resolution model, 15GB total CPU RAM can be really tight. What operating system are you using? How much virtual memory is available?

fedorov commented 7 months ago

If you need anything other than below, let me know what I should run!

exouser@morally-feasible-reindeer-gpu:~$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 10356748 132452 2903196    0    0    42    15   54   99  1  0 99  0  0

exouser@morally-feasible-reindeer-gpu:~$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy