Open lacan opened 2 years ago
@lacan Sorry for that, there was a bug in my Cellpose fork that only comes up with rescaling on. I believe I just fixed it, and will push the new version later today.
Thanks for the response! I wanted to try the new version but am still facing issues
I tried recreating a python environment with your update, but now am met with
INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose\lib\runpy.py", line 187, in _run_module_as_main
INFO: Cellpose2D-train: mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose\lib\runpy.py", line 146, in _get_module_details
INFO: Cellpose2D-train: return _get_module_details(pkg_main_name, error)
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose\lib\runpy.py", line 110, in _get_module_details
INFO: Cellpose2D-train: __import__(pkg_name)
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose\lib\site-packages\cellpose\__init__.py", line 1, in <module>
INFO: Cellpose2D-train: from . import core
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose\lib\site-packages\cellpose\core.py", line 12, in <module>
INFO: Cellpose2D-train: from focal_loss.focal_loss import FocalLoss
INFO: Cellpose2D-train: ModuleNotFoundError: No module named 'focal_loss
Thing is, I can't even find where this FocalLoss depenency is coming from in cellpose... What version is it trying to download?
@lacan you may have been dealing with the wrong version of cellpose. Omnipose requires my fork of cellpose, though that should install automatically. Try creating a new environment and downloading the latest pypi verison via pip install omnipose
(or use --force in an existing environment).
@kevinjohncutler , thanks for the info. What I had was your version of omnipose, but it seems the cellpose dependency was wrong and that it was corrected.
In any case, now that I made a new environment with pip install omnipose
(version 0.3.4), I am able to get farther in the training.
However, with the version of cellpose you are using, I cannot train my using my images. Computing the flows fails when using --omni
`cellpose --train --dir "cellpose-training\train" --test_dir "cellpose-training\test" --pretrained_model cyto2_omni --chan 0 --chan2 0 --diameter 30.0 --n_epochs 900 --learning_rate 0.2 --batch_size 8 --omni --use_gpu
Trace:
INFO: Cellpose2D-train: !NEW LOGGING SETUP! To see cellpose progress, set --verbose
INFO: Cellpose2D-train: No --verbose => no progress or info printed
INFO: Cellpose2D-train: 2022-10-25 16:49:15,214 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D-train: 2022-10-25 16:49:15,214 [INFO] >>>> using GPU
INFO: Cellpose2D-train: Omnipose enabled. See Omnipose repo for licencing details.
INFO: Cellpose2D-train: 2022-10-25 16:49:15,215 [INFO] Training omni model. Setting nclasses=4, RAdam=True
INFO: Cellpose2D-train: yoyoyoyoyggggggggg 4
INFO: Cellpose2D-train: 2022-10-25 16:49:15,310 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:49:15,372 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:49:15,407 [INFO] pretrained model C:\Users\oburri\.cellpose\models\cyto2_omnitorch_0 is being used
INFO: Cellpose2D-train: 2022-10-25 16:49:15,407 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D-train: 2022-10-25 16:49:16,753 [INFO] Training with rescale = 1.00
INFO: Cellpose2D-train: reshape_and_normalize_data_2 2 [0, 0] (6, 1, 222) (6, 1, 221)
INFO: Cellpose2D-train: 2022-10-25 16:49:16,771 [INFO] No precomuting flows with Omnipose. Computed during training.
INFO: Cellpose2D-train: 2022-10-25 16:49:16,775 [INFO] NOTE: computing flows for labels (could be done before to save time)
INFO: Cellpose2D-train:
INFO: Cellpose2D-train: 0%| | 0/2 [00:00<?, ?it/s]
INFO: Cellpose2D-train: 100%|##########| 2/2 [00:00<00:00, 12.12it/s]
INFO: Cellpose2D-train: 100%|##########| 2/2 [00:00<00:00, 12.05it/s]
INFO: Cellpose2D-train: 2022-10-25 16:49:16,975 [INFO] >>> Using RAdam optimizer
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> median diameter set to = 30
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> training network with 6 channel input <<<<
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> LR: 0.20000, batch_size: 8, weight_decay: 0.00001
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> ntrain = 8, ntest = 2
INFO: Cellpose2D-train: 2022-10-25 16:49:17,170 [INFO] >>>> nimg_per_epoch = 8
INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D-train: return _run_code(code, main_globals, None,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D-train: exec(code, run_globals)
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 507, in <module>
INFO: Cellpose2D-train: main()
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 477, in main
INFO: Cellpose2D-train: cpmodel_path = model.train(images, labels, train_files=image_names,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\models.py", line 1045, in train
INFO: Cellpose2D-train: model_path = self._train_net(train_data, train_labels,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\core.py", line 1014, in _train_net
INFO: Cellpose2D-train: imgi, lbl, scale = transforms.random_rotate_and_resize(
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\transforms.py", line 839, in random_rotate_and_resize
INFO: Cellpose2D-train: return omnipose.core.random_rotate_and_resize(X, Y=Y, scale_range=scale_range, gamma_range=gamma_range,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\omnipose\core.py", line 1387, in random_rotate_and_resize
INFO: Cellpose2D-train: imgi[n], lbl[n], scale[n] = random_crop_warp(img, y, nt, tyx, nchan, scale[n],
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\omnipose\core.py", line 1499, in random_crop_warp
INFO: Cellpose2D-train: lbl[k] = do_warp(l, M, tyx, offset=offset, order=0, mode=mode) # order 0 is 'nearest neighbor'
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\omnipose\core.py", line 1606, in do_warp
INFO: Cellpose2D-train: return scipy.ndimage.affine_transform(A, np.linalg.inv(M), offset=offset,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\scipy\ndimage\_interpolation.py", line 591, in affine_transform
INFO: Cellpose2D-train: raise RuntimeError('affine matrix has wrong number of rows')
INFO: Cellpose2D-train: RuntimeError: affine matrix has wrong number of rows
And without --omni I get a different error cellpose --train --dir "cellpose-training\train" --test_dir "cellpose-training\test" --pretrained_model cyto2 --chan 0 --chan2 0 --diameter 30.0 --n_epochs 900 --learning_rate 0.2 --batch_size 8 --use_gpu
INFO: Cellpose2D-train: !NEW LOGGING SETUP! To see cellpose progress, set --verbose
INFO: Cellpose2D-train: No --verbose => no progress or info printed
INFO: Cellpose2D-train: 2022-10-25 16:51:25,492 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D-train: 2022-10-25 16:51:25,492 [INFO] >>>> using GPU
INFO: Cellpose2D-train: yoyoyoyoyggggggggg None
INFO: Cellpose2D-train: 2022-10-25 16:51:25,589 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:51:25,651 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:51:25,690 [INFO] pretrained model C:\Users\oburri\.cellpose\models\cyto2torch_0 is being used
INFO: Cellpose2D-train: 2022-10-25 16:51:25,690 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D-train: return _run_code(code, main_globals, None,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D-train: exec(code, run_globals)
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 507, in <module>
INFO: Cellpose2D-train: main()
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 456, in main
INFO: Cellpose2D-train: model = models.CellposeModel(device=device,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\models.py", line 445, in __init__
INFO: Cellpose2D-train: super().__init__(gpu=gpu, pretrained_model=False,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\core.py", line 165, in __init__
INFO: Cellpose2D-train: self.net = CPnet(self.nbase,
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\resnet_torch.py", line 241, in __init__
INFO: Cellpose2D-train: self.output = batchconv(nbaseup[0], nout, 1, self.dim)
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\resnet_torch.py", line 35, in batchconv
INFO: Cellpose2D-train: nn.Conv2d(in_channels, out_channels, sz, padding=sz//2),
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\torch\nn\modules\conv.py", line 444, in __init__
INFO: Cellpose2D-train: super(Conv2d, self).__init__(
INFO: Cellpose2D-train: File "F:\conda-envs\omnipose034\lib\site-packages\torch\nn\modules\conv.py", line 85, in __init__
INFO: Cellpose2D-train: if out_channels % groups != 0:
INFO: Cellpose2D-train: TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'
I can share the training data with you privately in case you would like to see why this dataset would fail in omnipose but not in cellpose 2.0
@lacan sorry for the delay, that would be great if you could share your training data so I can debug. You can reach me at kcutler@uw.edu. One thing that could be relevant is your torch/cuda versions.
Hello, So I waited a bit too long for this and have now retested things with version 0.4.4
I tried training this data cellpose-training.zip
with the command below and the error as well... Testing with other models gives different errors, but I wouild say one thing at a time :)
INFO: Executing command:
cmd.exe /C F:\conda-envs\cellpose-omnipose-biop-gpu\python.exe -W ignore -m omnipose --train --dir "C:\Users\oburri\Desktop\QuPath LuCa Demo Project\cellpose-training\train" --test_dir "C:\Users\oburri\Desktop\QuPath LuCa Demo Project\cellpose-training\test" --pretrained_model None --n_epochs 50 --learning_rate 0.2 --omni --cluster --batch_size 8 --use_gpu --verbose
INFO: This command should run directly if copy-pasted into your shell
INFO: Cellpose2D: 2023-05-16 16:08:53,927 [INFO] WRITING LOG OUTPUT TO C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: log file C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: 2023-05-16 16:08:54,090 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D: 2023-05-16 16:08:54,090 [INFO] >>>> using GPU
INFO: Cellpose2D: Omnipose enabled. See Omnipose repo for licencing details.
INFO: Cellpose2D: 2023-05-16 16:08:54,090 [INFO] Training omni model. Setting nclasses=4, RAdam=True
INFO: Cellpose2D: 2023-05-16 16:08:54,115 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-05-16 16:08:54,124 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-05-16 16:08:54,128 [INFO] training from scratch
INFO: Cellpose2D: 2023-05-16 16:08:54,128 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D: 2023-05-16 16:08:54,196 [INFO] Training with rescale = 1.00
INFO: Cellpose2D: reshape_train_test (187, 187) [0, 0] True True
INFO: Cellpose2D: reshape_and_normalize_data_2 2 [0, 0] (2, 187, 187) (2, 187, 187)
INFO: Cellpose2D: reshape_train_test_2 (2, 187, 187)
INFO: Cellpose2D: 2023-05-16 16:08:54,212 [INFO] No precomuting flows with Omnipose. Computed during training.
INFO: Cellpose2D: 2023-05-16 16:08:54,214 [INFO] NOTE: computing flows for labels (could be done before to save time)
INFO: Cellpose2D:
INFO: Cellpose2D: 0%| | 0/2 [00:00<?, ?it/s]
INFO: Cellpose2D: 50%|##### | 1/2 [00:01<00:01, 1.37s/it]
INFO: Cellpose2D: 100%|##########| 2/2 [00:01<00:00, 1.59it/s]
INFO: Cellpose2D: 100%|##########| 2/2 [00:01<00:00, 1.35it/s]
INFO: Cellpose2D: Traceback (most recent call last):
INFO: Cellpose2D: File "<frozen runpy>", line 198, in _run_module_as_main
INFO: Cellpose2D: File "<frozen runpy>", line 88, in _run_code
INFO: Cellpose2D: File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\omnipose\__main__.py", line 3, in <module>
INFO: Cellpose2D: main(omni_CLI=True)
INFO: Cellpose2D: File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\cellpose_omni\__main__.py", line 494, in main
INFO: Cellpose2D: cpmodel_path = model.train(images, labels, links, train_files=image_names,
INFO: Cellpose2D: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: Cellpose2D: File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\cellpose_omni\models.py", line 1138, in train
INFO: Cellpose2D: test_labels = labels_to_flows(test_labels, test_links, files=test_files, use_gpu=self.gpu, device=self.device, dim=self.dim)
INFO: Cellpose2D: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: Cellpose2D: File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\omnipose\core.py", line 258, in labels_to_flows
INFO: Cellpose2D: labels, dist, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu,
INFO: Cellpose2D: ^^^^^^^^^^^^^^^^^^^^^^^^
INFO: Cellpose2D: ValueError: too many values to unpack (expected 4)
I received the exactly same error as @lacan had. Tested with pip installation of omnipose 0.4.4 on python 3.8.12 with pytorch 1.11.0 and 3.10.8 with pytorch 1.13 Dataset: 3 images from the official omnipose dataset "bact_phase" downloaded from https://osf.io/xmury/
Training command:
python -m omnipose --train --use_gpu --check_mkl --verbose \
--dir ./data/cellpose_format/omnipose/train/bact_phase --test_dir ./data/cellpose_format/omnipose/test/bact_phase \
--img_filter _img --mask_filter _masks \
--chan 0 --chan2 0 \
--n_epochs 4000 --pretrained_model None \
--save_every 200 --save_each \
--learning_rate 0.1 --diameter 0 --batch_size 16 --RAdam
Error message for python 3.8.12
resnet_torch.py backend check. module 'torch.backends' has no attribute 'mps'
2023-07-24 21:33:15,422 [INFO] WRITING LOG OUTPUT TO /root/.cellpose/run.log
log file /root/.cellpose/run.log
2023-07-24 21:33:29,461 [INFO] ** TORCH GPU version installed and working. **
2023-07-24 21:33:29,462 [INFO] >>>> using GPU
Omnipose enabled. See Omnipose repo for licencing details.
2023-07-24 21:33:29,462 [INFO] Training omni model. Setting nclasses=4, RAdam=True
2023-07-24 21:33:29,500 [INFO] not all flows are present, will run flow generation for all images
2023-07-24 21:33:29,505 [INFO] not all flows are present, will run flow generation for all images
2023-07-24 21:33:29,508 [INFO] training from scratch
2023-07-24 21:33:29,508 [INFO] median diameter set to 0 => no rescaling during training
reshape_train_test (600, 600) [0, 0] True True
reshape_and_normalize_data_2 3 [0, 0] (2, 600, 600) (2, 600, 600)
reshape_train_test_2 (2, 600, 600)
2023-07-24 21:33:29,747 [INFO] No precomuting flows with Omnipose. Computed during training.
2023-07-24 21:33:29,756 [INFO] NOTE: computing flows for labels (could be done before to save time)
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00, 1.01s/it]
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.8/site-packages/omnipose/__main__.py", line 3, in <module>
main(omni_CLI=True)
File "/opt/conda/lib/python3.8/site-packages/cellpose_omni/__main__.py", line 494, in main
cpmodel_path = model.train(images, labels, links, train_files=image_names,
File "/opt/conda/lib/python3.8/site-packages/cellpose_omni/models.py", line 1138, in train
test_labels = labels_to_flows(test_labels, test_links, files=test_files, use_gpu=self.gpu, device=self.device, dim=self.dim)
File "/opt/conda/lib/python3.8/site-packages/omnipose/core.py", line 258, in labels_to_flows
labels, dist, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu,
ValueError: too many values to unpack (expected 4)
Any hints would be appreciated!
Hi @kevinjohncutler. Any news on this? Should I try the bleeding edge github repo version?
@lacan @m274d very sorry I didn't see any updates on this thread. I'm unable to reproduce these errors, but please do try the latest github version in a new environment. These errors have to do with installation (possibly due to having cellpose in the same env).
Hi @kevinjohncutler,
Thanks for the suggestion I did it again and unfortunetely no success
INFO: Executing command:
cmd.exe /C D:\conda\conda-envs\omnipose-github\python.exe -W ignore -m omnipose --train --dir "N:\public\alejandro.alonsocalleja_EDBB\MK_Segmentation_20230531\QuPath Cellpose Training Project - Omnipose test\cellpose-training\train" --test_dir "N:\public\alejandro.alonsocalleja_EDBB\MK_Segmentation_20230531\QuPath Cellpose Training Project - Omnipose test\cellpose-training\test" --pretrained_model None --min_train_masks 0 --omni --nchan 3 --use_gpu --verbose
INFO: This command should run directly if copy-pasted into your shell
INFO: Cellpose2D: 2023-09-14 14:51:50,950 [INFO] WRITING LOG OUTPUT TO C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: log file C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: 2023-09-14 14:51:51,155 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D: 2023-09-14 14:51:51,155 [INFO] >>>> using GPU
INFO: Cellpose2D: 2023-09-14 14:51:51,155 [INFO] Training omni model. Setting nclasses=2, RAdam=False
INFO: Cellpose2D: 2023-09-14 14:51:53,849 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-09-14 14:51:54,542 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-09-14 14:51:55,363 [INFO] channel axis detected at position 0, manually specify if incorrect
INFO: Cellpose2D: 2023-09-14 14:51:55,363 [INFO] training from scratch
INFO: Cellpose2D: 2023-09-14 14:51:55,363 [INFO] median diameter set to 0 => no rescaling during training
INFO: Cellpose2D: 2023-09-14 14:52:02,274 [INFO] No precomuting flows with Omnipose. Computed during training.
INFO: Cellpose2D: 2023-09-14 14:52:02,409 [INFO] NOTE: computing flows for labels (could be done before to save time)
INFO: Cellpose2D:
INFO: Cellpose2D: 0%| | 0/13 [00:00<?, ?it/s]
INFO: Cellpose2D: 8%|7 | 1/13 [00:00<00:03, 3.23it/s]
INFO: Cellpose2D: 15%|#5 | 2/13 [00:01<00:07, 1.42it/s]
INFO: Cellpose2D: 31%|### | 4/13 [00:01<00:02, 3.22it/s]
INFO: Cellpose2D: 46%|####6 | 6/13 [00:01<00:01, 4.86it/s]
INFO: Cellpose2D: 62%|######1 | 8/13 [00:01<00:00, 6.08it/s]
INFO: Cellpose2D: 77%|#######6 | 10/13 [00:01<00:00, 8.13it/s]
INFO: Cellpose2D: 92%|#########2| 12/13 [00:02<00:00, 9.66it/s]
INFO: Cellpose2D: 100%|##########| 13/13 [00:02<00:00, 6.12it/s]
INFO: Cellpose2D: Traceback (most recent call last):
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D: return _run_code(code, main_globals, None,
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D: exec(code, run_globals)
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\__main__.py", line 12, in <module>
INFO: Cellpose2D: main()
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\__main__.py", line 9, in main
INFO: Cellpose2D: cellpose_omni_main(args)
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\site-packages\cellpose_omni\__main__.py", line 422, in main
INFO: Cellpose2D: cpmodel_path = model.train(images, labels, links, train_files=image_names,
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\site-packages\cellpose_omni\models.py", line 1421, in train
INFO: Cellpose2D: test_labels = labels_to_flows(test_labels, test_links, files=test_files,
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\core.py", line 258, in labels_to_flows
INFO: Cellpose2D: labels, dist, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu,
INFO: Cellpose2D: ValueError: too many values to unpack (expected 4)
Hi @kevinjohncutler ,
I still cannot use the latest omnipose for training, got the same error Here is the data I am trying to use cellpose-training.zip
Is there anything else we can try? Happy to try and find some time to debug this over Zoom if this could help
Best Oli
So a little bit of further debugging...
It turns out training works with this command
omnipose --train --dir "N:\temp-Oli\QuPath PCB\QuPath Cellpose training project\cellpose-training\train" --pretrained_model None --n_epochs 10 --omni --use_gpu --verbose
But causes the error shown above if we give it a test
folder
omnipose --train --dir "N:\temp-Oli\QuPath PCB\QuPath Cellpose training project\cellpose-training\train" --test_dir "N:\temp-Oli\QuPath PCB\QuPath Cellpose training project\cellpose-training\test" --pretrained_model None --n_epochs 10 --omni --use_gpu --verbose
So something seems to go wrong when trying to do something to the images in the test
folder
Is this something only I am experiencing or is no one providing a test folder when training a model, which kind of scary actually....
@kevinjohncutler Any info on this? We are happily training data now, but cannot get any validation loss. Have you been able to reproduce the error with the data I provided?
Still no news from this... So I noticed the following by sifting through the code myself This line here https://github.com/kevinjohncutler/omnipose/blob/a585929215d4908a6279eb0503a0c46d02f4e2e9/omnipose/core.py#L263 Expects 4 outputs, but the method that it calls actually returns 5! https://github.com/kevinjohncutler/omnipose/blob/a585929215d4908a6279eb0503a0c46d02f4e2e9/omnipose/core.py#L401
I am guessing that boundaries
is not necessary for this part of the code, so I simply modified Line 263 to read
labels, dist, bounds, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu, ...
and bounds just does not get used.
However this leads to the new error
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\core.py", line 272, in <listcomp>
INFO: Cellpose2D: flows = [np.concatenate((labels[n][np.newaxis,:,:],
INFO: Cellpose2D: File "<__array_function__ internals>", line 200, in concatenate
INFO: Cellpose2D: File "D:\conda\conda-envs\omnipose-github\lib\site-packages\torch\_tensor.py", line 970, in __array__
INFO: Cellpose2D: return self.numpy()
INFO: Cellpose2D: TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Am I right that this part then never really worked before? i.e. No validation data was used, so that this error was not caught? Otherwise is there something else to do? You mention often that flows are computed on the fly for omnipose, should that not be the case for the validation data too?
Some more info... I managed to get that last error to disappear and to work by replacing the lines here https://github.com/kevinjohncutler/omnipose/blob/a585929215d4908a6279eb0503a0c46d02f4e2e9/omnipose/core.py#L269
with
veci[n].cpu(),
heat[n].cpu()[np.newaxis,:,:]), axis=0).astype(np.float32)
But there is a big issue in my eyes... How do I access the training and validation losses? In the output of your training there is only
INFO: Cellpose2D: 2024-04-08 17:30:18,398 [INFO] Train epoch: 32 | Time: 0.46min | last epoch: 0.74s | <sec/epoch>: 0.81s | <sec/batch>: 0.33s | <Batch Loss>: 3.527480 | <Epoch Loss>: 4.167647
In the original cellpose there is also a "Test loss" that we use to see how it went. I opened a separate issue to have this logged somewhere
@lacan So sorry I didn't circle back to this, please feel free to email me moving forward with any urgent/outstanding issues.
For the issue of validation data during training: I honestly never worked with that code from cellpose, as validation loss has never been useful to me. I have a --save_each
parameter instead to save intermediate models to allow testing after training to investigate how the model performs over the training epochs, and a validation curve can be made from that. When quantitative metrics are needed, I just care about the end model over the entire test dataset, and computing the loss on that at each epoch can really slow down training. In my experience, if the training and validation sets are big enough and representative of each other, there should be no difference in convergence either.
I have been meaning to build out a better way to log train and test loss, and when I get around to that, I will keep the validation loss in mind.
I should mention, the logs including "Cellpose2D" doesn't look familiar, so also be sure you are on the latest version and have explicitly uninstalled cellpose_omni.
the logs including "Cellpose2D" doesn't look familiar
yes that is because the output is from the cellpose QuPath extension, the name of the process is appended at the start of each line, this is why it looks weird.
Thanks for the information. Do you want me to add a PR over what I did? In its current state, omnipose will will always crash if a user defines --test_dir
or do you want to remove that option all together? In which case I would suggest a section on how to validate an omnipose model in the documentation.
@lacan Yes, please do submit a PR if you have made the edits required to get validation loss working. Thanks!
Hello, I got exactly the same errors when defining a --test_dir
to monitor some validation loss. I would be interested for a permanent fix too ! Thanks
When trying to train a cellpose model I am met with the following error right after computing flows
Any idea?
Trying without the --omni pose goes farther but not much
Training on a cellpose installation works though....
Any thoughts?