Patch size - Githubissues

mcbrs1a commented 5 years ago

🐛 Bug

when using a patch size -ps 50 50 50 with -dm 3 and a 3D image file (.nii) I receive the error:

The error occurs when running on a CPU and a GPU, so it has nothing to do with configuration of the GPU

To Reproduce

srun /home/c.mcbrs1/.conda/envs/synthtorch/bin/python nn-train \
    -s /nfshome/store01/users/c.mcbrs1/synthtorch/source/ \
    -t /nfshome/store01/users/c.mcbrs1/synthtorch/target/ \
    -o TVunetXVolF.pth \
    --nn-arch unet \
    --n-layers 3 \
    --n-epochs 2000 \
    --batch-size 1 \
    --ext nii \
    -dm 3 \
    -id 3 \
    -ps 50 50 50 \
    --plot-loss Vloss_testCarCPUXVolF.png \
    -vv \
    --out-config-file VconfigCarCPUXVolF.json

Steps to reproduce the behavior:

single 3D nifti file in source and target directory
run the above command

Expected behavior

For training to commence for a 3D image using a patch size that is also 3D

Environment

synthtorch version 0.3.8 commit hash f7409f7 installed via pip
niftidataset version 0.1.5 PyTorch version 1.1.0 matplotlib version: 3.1.1 numpy version: 1.17.0 python version: 3.7.3
OS Linux (redhat) pip (inside conda enviroment)
Build command you used (if compiling from source): CUDA Version: 10.1 GPU models and configuration: Tesla P100-PCIE x2
Any other relevant information:

Additional context

error log attached

2019-08-19 11:10:55,466 - synthtorch.learn.learner - WARNING - CUDA does not appear to be available on your system.
2019-08-19 11:10:55,558 - synthtorch.learn.learner - DEBUG - Unet(
  (criterion): MSELoss()
  (start): ModuleList(
    (0): Sequential(
      (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
      (1): Conv3d(1, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
      (2): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace)
    )
    (1): Sequential(
      (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
      (1): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
      (2): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace)
    )
  )
  (down_layers): ModuleList(
    (0): ModuleList(
      (0): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(32, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
      (1): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
    )
    (1): ModuleList(
      (0): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
      (1): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
    )
  )
  (bridge): ModuleList(
    (0): Sequential(
      (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
      (1): Conv3d(128, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
      (2): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace)
    )
    (1): Sequential(
      (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
      (1): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
      (2): InstanceNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace)
    )
  )
  (up_layers): ModuleList(
    (0): ModuleList(
      (0): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(256, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
      (1): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
    )
    (1): ModuleList(
      (0): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(128, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
      (1): Sequential(
        (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
        (1): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
        (2): InstanceNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
        (3): ReLU(inplace)
      )
    )
  )
  (end): ModuleList(
    (0): Sequential(
      (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
      (1): Conv3d(64, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
      (2): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace)
    )
    (1): Sequential(
      (0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
      (1): Conv3d(32, 32, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
      (2): InstanceNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      (3): ReLU(inplace)
    )
  )
  (finish): Conv3d(32, 1, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
  (upsampconvs): ModuleList(
    (0): Sequential(
      (0): ReplicationPad3d((2, 2, 2, 2, 2, 2))
      (1): Conv3d(256, 128, kernel_size=(5, 5, 5), stride=(1, 1, 1), bias=False)
    )
    (1): Sequential(
      (0): ReplicationPad3d((2, 2, 2, 2, 2, 2))
      (1): Conv3d(128, 64, kernel_size=(5, 5, 5), stride=(1, 1, 1), bias=False)
    )
    (2): Sequential(
      (0): ReplicationPad3d((2, 2, 2, 2, 2, 2))
      (1): Conv3d(64, 32, kernel_size=(5, 5, 5), stride=(1, 1, 1), bias=False)
    )
  )
)
2019-08-19 11:10:55,559 - synthtorch.learn.learner - INFO - Number of trainable parameters in model: 10632832
2019-08-19 11:10:55,559 - synthtorch.learn.learner - INFO - Initializing weights with kaiming
2019-08-19 11:10:55,641 - synthtorch.learn.learner - INFO - No data augmentation will be used
2019-08-19 11:10:55,641 - synthtorch.learn.learner - DEBUG - Training transforms: [ToTensor, RandomCrop(output_size=[50, 50, 50], threshold=None)]
2019-08-19 11:10:55,642 - synthtorch.learn.learner - INFO - Number of training images: 1
2019-08-19 11:10:55,643 - synthtorch.learn.learner - INFO - LR: 1.00e-03
2019-08-19 11:10:55,823 - synthtorch.exec.nn_train - ERROR - too many values to unpack (expected 2)
Traceback (most recent call last):
  File "/home/c.mcbrs1/.conda/envs/synthtorch/lib/python3.7/site-packages/synthtorch/exec/nn_train.py", line 249, in main
    learner.fit(args.n_epochs, args.clip, args.checkpoint, args.trained_model)
  File "/home/c.mcbrs1/.conda/envs/synthtorch/lib/python3.7/site-packages/synthtorch/learn/learner.py", line 126, in fit
    for i, (src, tgt) in enumerate(self.train_loader):
  File "/home/c.mcbrs1/.conda/envs/synthtorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 560, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/c.mcbrs1/.conda/envs/synthtorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 560, in <listcomp>
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/c.mcbrs1/.conda/envs/synthtorch/lib/python3.7/site-packages/niftidataset/dataset.py", line 84, in __getitem__
    sample = self.transform(sample)
  File "/home/c.mcbrs1/.conda/envs/synthtorch/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
    img = t(img)
  File "/home/c.mcbrs1/.conda/envs/synthtorch/lib/python3.7/site-packages/niftidataset/transforms.py", line 184, in __call__
    hh, ww = self.output_size
ValueError: too many values to unpack (expected 2)
srun: error: ccs3019: task 0: Exited with exit code 1

error_output_synthtorch.txt

jcreinhold commented 5 years ago

This problem is due to you having only one image and the option —valid-split (-vs) defaulting to 0.2. This options splits the training dataset into a default of 80% training data and 20% validation data. Synthtorch cannot do that in this case because you only have one image.

If you are just testing to make sure that you can get the package to work, then also set the valid source and target directory to the same values as your training. You can try to set -vs to 0 but I believe it will error.

If you have some real use case for only training on one dataset with no validation data, then please open a feature request but please explain the use case further.

jcreinhold commented 5 years ago

I closed this prematurely because I misread the error (although what I stated about using one image is still true). Sorry about that.

I pushed a fix. If you reinstall synthtorch, it should work now. Let me know if it doesn't.

By the way, you only need to specify the image dimension (-id) when using a VAE. In your current setup, i.e., a U-Net, you don't need to specify this.

mcbrs1a commented 5 years ago

Thanks Jacob, im opted to not use the patch size and shrinking the images, but have reinstalled your new patch.

One quick question you mention:

I assume this means, each subject is registered, correct? Do different subjects need to be registered to a common space also?

An important note is that, since the synthesis method used is a supervised method. So it is required that all of the subject scans be co-registered (e.g., FLAIR and T2 are registered to T1). Additionally, all of the images must be of the same size (e.g., if the T1-w images are of dimension h x w x d, then the T2-w and FLAIR are also of dimensions h x w x d). https://github.com/jcreinhold/synthtorch/blob/master/tutorials/5min_tutorial.md#example-testing-directory-setup ᐧ

On Tue, 20 Aug 2019 at 00:10, Jacob Reinhold notifications@github.com wrote:

I closed this prematurely because I misread the error (although what I stated about using one image is still true). Sorry about that.

I pushed a fix. If you reinstall synthtorch, it should work now. Let me know if it doesn't.

By the way, you only need to specify the image dimension (-id) when using a VAE. In your current setup, i.e., a U-Net, you don't need to specify this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jcreinhold/synthtorch/issues/18?email_source=notifications&email_token=AH46HN3QG2DRJWRWIISOBUDQFMR5HA5CNFSM4IM5G63KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4URZIQ#issuecomment-522788002, or mute the thread https://github.com/notifications/unsubscribe-auth/AH46HNZKSDXRVTO7EEHV6Z3QFMR5HANCNFSM4IM5G63A .

jcreinhold commented 5 years ago

Subjects don't need to be registered to a common space, source and target images just need to be co-registered as stated in the quote.

jcreinhold / synthtorch

Patch size #18

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context