TotalsegmentatornRunning Slow on New M3Max Macbook Pro

QianMuXiao commented 8 months ago

I'm running TotalSegmentator on my M3Max chip (16-core CPU 40-core GPU) Macbook Pro and it's taking over 600 seconds to fully segment a 3D CT image of size 51251254, whereas it tends to take about 60 seconds on my 3070 GPU desktop. Is it possible that the latest version of totalsegmentator just uses the Macbook's CPU instead of calling the Macbook's "MPS" when using pytorch?

francescopisu commented 8 months ago

Are you using TotalSegmentator for 3d segmentation ? Pytorch's MPS backend doesn't support 3d ops yet, hence even if TotalSegmentator is doing everything right your computation would still be slow on a mac.

QianMuXiao commented 8 months ago

francescopisu Yes I used CT image data with dimensions of about （512,512,54）, but before that I didn't realize that Pytorch's MPS device doesn't support 3D operations, I'll re-check the docs on that, thanks a lot!

QianMuXiao commented 8 months ago

Are you using TotalSegmentator for 3d segmentation ? Pytorch's MPS backend doesn't support 3d ops yet, hence even if TotalSegmentator is doing everything right your computation would still be slow on a mac.

I checked the Pytorch documentation in detail and after some practice the current night version of pytorch only supports MPS accelerated Conv3d but not ConvTranspose3d operations.

wasserth commented 8 months ago

It seems that quite recently pytorch finally added Conv3d to the nightly version. I pushed a commit to master to allow "mps" as device argument. I did not have a chance to test it since I am not running the newest MacOs which is required for this to work.

wasserth commented 8 months ago

It seems that nnUNnet also uses ConvTranspose3D which is not yet supported. So mps is not working for now.

QianMuXiao commented 8 months ago

It seems that nnUNnet also uses ConvTranspose3D which is not yet supported. So mps is not working for now.

Yes, I called the mps device parameter by modifying the source code in the Totalsegmentator package, and in the latest nightly version of the pytorch environment it prompts that pytorch does not support ConvTranspose3D under mps.

wxc-2020 commented 5 months ago

I run Totalsegmentator on M3 MAX mbp to segment only one CT image, regardless of using --fast or --rb, the error is as follows: Background workers died. Look for the error message further up! If there is none then your RAM was full and the worker was killed by the OS. Use fewer workers or get more RAM in that case!

francescopisu commented 5 months ago

@wxc-2020 How many workers for preprocessing ? nr_thr_resamp and nr_thr_saving My run crashed very recently because I spawned too many workers (31 to be precise) and I drained all 64 GB of RAM.

QianMuXiao commented 5 months ago

My running still fine，My Mac got only 48GB RAM and when I run the 3D Segment with full-res on CPU its slow but wont broken and don`t need to use swap

francescopisu commented 5 months ago

@QianMuXiao It also depends on the specifics of your tomographic data

QianMuXiao commented 5 months ago

@francescopisu The MSD datasets I've been using lately seem to run segmentation fine

wxc-2020 commented 5 months ago

@francescopisu I didn't specifically set these two parameters, which seem to be the default; In addition, my CT images are 256256300；I still can't find the specific reason for the error.

francescopisu commented 5 months ago

@wxc-2020 Can you show the entire stack trace when you get the error ?

wxc-2020 commented 5 months ago

@francescopisu Thanks for your care. I have now uninstalled and installed all my conda and re-run Totalsegmentator, no longer report corresponding errors and run smoothly with the CPU（it takes 3 to 4 minutes for mbp with 128g memory to segment a CT image in --fast mode）.I think the reason for the error may be that there are some conflicts in my old conda env.

francescopisu commented 4 months ago

It seems that nnUNnet also uses ConvTranspose3D which is not yet supported. So mps is not working for now.

I made a quick guide for building PyTorch from source with the repository's state of the yet to be merged PR implementing ConvTranspose3d in MPS from mattiaspaul. Got 3d highres cardiac chambers segmentations for a coronary 512x512x224 CTA scan in under a minute.

QianMuXiao commented 4 months ago

@francescopisu Your guide runs perfect on my MBP, Thanks a lot, it takes about 86s for task total (512x512x54 CT scan from MSD spleen dataset) on my M3Max with 48G RAM.

w1ebr commented 4 months ago

@francescopisu I tried using your guide on my M3 MBP and when trying "install -r requirements.txt --no-cache-dir" I get an error message saying that there is no "-r" option for install (which install = /usr/bin/install). Is there another "install" program that should be called? Thanks! Gene

QianMuXiao commented 4 months ago

@w1ebr add pip before install

francescopisu commented 4 months ago

@w1ebr My bad, I forgot the "pip" for "pip install". I updated the blog post as well. Thanks

QianMuXiao commented 4 months ago

@francescopisu is it possible to make my 3D slicer Using this version of pytorch？

francescopisu commented 4 months ago

@QianMuXiao I'm afraid that's not a trivial thing to do. I may need some support @lassoan.

w1ebr commented 4 months ago

Thank you!

lassoan commented 4 months ago

is it possible to make my 3D slicer Using this version of pytorch

It may be just a matter of weeks until a new pytorch official version comes out that works out of the box, so probably it is not worth spending a whole lot of time with this. But, if that seems like a long time then, you can run pip_install('https://example.com/path/to/custompytorch.tar.gz') in Slicer's Python console to install a custom pytorch build from a URL. If the TotalSegmentator extension finds that pytorch is installed then it will use that.

w1ebr commented 4 months ago

Should OpenMP also be installed? My build log says it wasn't found

QianMuXiao commented 4 months ago

@francescopisu maybe your quick guide commadn ‘conda create --prefix==./venv python=3.10’ should change to ‘conda create --prefix = ./venv python=3.10’？

wxc-2020 commented 4 months ago

@francescopisu Perfect! Thanks again!, it takes effective speed increase！

francescopisu commented 4 months ago

@francescopisu maybe your quick guide commadn ‘conda create --prefix==./venv python=3.10’ should change to ‘conda create --prefix = ./venv python=3.10’？

Yes, updated.

cutright commented 4 months ago

Just noticed there was a discussion about this here, I also got Mac GPU working a few weeks ago: https://github.com/wasserth/TotalSegmentator/issues/39#issuecomment-2007890904

Working great for me! Thanks @wasserth et al

FWIW, I installed the commit from the PR mentioned in the prior comment with: pip install git+https://github.com/pytorch/pytorch.git@3c61c525694eca0f895bb01fc67c16793226051a

Then set the device to 'mps' in my totalsegmentator call and it seems to work.

It took about 8min to get 69 ROIs on a CT sim for a prostate case, using an Apple M2 Pro on Sonoma 14.4 and python 3.10.9. This included DICOM output. I was able to import the DICOM output into my viewer and it looks pretty good to me.

wasserth / TotalSegmentator

TotalsegmentatornRunning Slow on New M3Max Macbook Pro #250