Training fails in Omnipose but works in cellpose

lacan commented 2 years ago

When trying to train a cellpose model I am met with the following error right after computing flows

Any idea?

INFO: Executing command: [cmd.exe /C F:\conda-envs\omnipose-v201\python.exe -W ignore -m cellpose --train --dir N:\public\christine.gopfert_PTH\Muscle_Segmentation_BF_Cellpose_20220913\QuPath Cellpose Training - Oli\cellpose-training\train --test_dir N:\public\christine.gopfert_PTH\Muscle_Segmentation_BF_Cellpose_20220913\QuPath Cellpose Training - Oli\cellpose-training\test --pretrained_model cyto2_omni --chan 0 --chan2 0 --diameter 90.0 --n_epochs 900 --learning_rate 0.2 --batch_size 8 --omni --cluster --use_gpu --verbose]
INFO: Cellpose2D-train: 2022-09-29 17:15:31,058 [INFO] WRITING LOG OUTPUT TO C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D-train: log file C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D-train: 2022-09-29 17:15:31,278 [INFO] ** TORCH CUDA version installed and working. **
INFO: Cellpose2D-train: 2022-09-29 17:15:31,278 [INFO] >>>> using GPU
INFO: Cellpose2D-train: Omnipose enabled. See Omnipose repo for licencing details.
INFO: Cellpose2D-train: 2022-09-29 17:15:31,278 [INFO] Training omni model. Setting nclasses=4, RAdam=True
INFO: Cellpose2D-train: 2022-09-29 17:15:31,596 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-09-29 17:15:31,688 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-09-29 17:15:31,737 [INFO] pretrained model C:\Users\oburri\.cellpose\models\cyto2_omnitorch_0 is being used
INFO: Cellpose2D-train: 2022-09-29 17:15:31,737 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D-train: 2022-09-29 17:15:32,955 [INFO] Training with rescale = 1.00
INFO: Cellpose2D-train: 2022-09-29 17:15:33,009 [INFO] No precomuting flows with Omnipose. Computed during training.
INFO: Cellpose2D-train: 2022-09-29 17:15:33,054 [INFO] NOTE: computing flows for labels (could be done before to save time)
INFO: Cellpose2D-train: 
INFO: Cellpose2D-train:   0%|          | 0/1 [00:00<?, ?it/s]
INFO: Cellpose2D-train: 100%|##########| 1/1 [00:01<00:00,  1.45s/it]
INFO: Cellpose2D-train: 100%|##########| 1/1 [00:01<00:00,  1.45s/it]
INFO: Cellpose2D-train: 2022-09-29 17:15:34,571 [INFO] >>> Using RAdam optimizer
INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D-train:     return _run_code(code, main_globals, None,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D-train:     exec(code, run_globals)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\__main__.py", line 504, in <module>
INFO: Cellpose2D-train:     main()
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\__main__.py", line 474, in main
INFO: Cellpose2D-train:     cpmodel_path = model.train(images, labels, train_files=image_names,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\models.py", line 1037, in train
INFO: Cellpose2D-train:     model_path = self._train_net(train_data, train_labels, 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\core.py", line 918, in _train_net
INFO: Cellpose2D-train:     diam_train = np.array([utils.diameters(train_labels[k][0],omni=self.omni)[0] 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\core.py", line 918, in <listcomp>
INFO: Cellpose2D-train:     diam_train = np.array([utils.diameters(train_labels[k][0],omni=self.omni)[0] 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\utils.py", line 390, in diameters
INFO: Cellpose2D-train:     return omnipose.core.diameters(masks), None
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\omnipose\core.py", line 163, in diameters
INFO: Cellpose2D-train:     dt_pos = np.abs(dt[dt>dist_threshold])
INFO: Cellpose2D-train: TypeError: '>' not supported between instances of 'NoneType' and 'int'
INFO: Cellpose2D-train: ggg (993,) [0 1 2 3 4 5 6 7 8]
INFO: Cellpose2D-train: ggg (993,) [0 1 2 3 4 5 6 7 8]
INFO: Cellpose2D-train: ggg (1000,) [0 1 2 3 4 5 6 7 8]
INFO: Cellpose2D-train: ggg (1000,) [0 1 2 3 4 5]
INFO: Cellpose2D-train: ggg (993,) [0]
INFO: Cellpose2D-train: ggg (993,) [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
INFO: Cellpose2D-train: ggg (993,) [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]

Trying without the --omni pose goes farther but not much

INFO: Executing command: [cmd.exe /C F:\conda-envs\omnipose-v201\python.exe -W ignore -m cellpose --train --dir N:\public\christine.gopfert_PTH\Muscle_Segmentation_BF_Cellpose_20220913\QuPath Cellpose Training - Oli\cellpose-training\train --test_dir N:\public\christine.gopfert_PTH\Muscle_Segmentation_BF_Cellpose_20220913\QuPath Cellpose Training - Oli\cellpose-training\test --pretrained_model cyto2 --chan 0 --chan2 0 --diameter 90.0 --n_epochs 900 --learning_rate 0.2 --batch_size 8 --use_gpu --verbose]
INFO: Cellpose2D-train: 2022-09-29 17:18:45,710 [INFO] WRITING LOG OUTPUT TO C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D-train: log file C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D-train: 2022-09-29 17:18:45,943 [INFO] ** TORCH CUDA version installed and working. **
INFO: Cellpose2D-train: 2022-09-29 17:18:45,943 [INFO] >>>> using GPU
INFO: Cellpose2D-train: 2022-09-29 17:18:46,238 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-09-29 17:18:46,313 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-09-29 17:18:46,408 [INFO] pretrained model C:\Users\oburri\.cellpose\models\cyto2torch_0 is being used
INFO: Cellpose2D-train: 2022-09-29 17:18:46,408 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D-train: 2022-09-29 17:18:47,671 [INFO] Training with rescale = 1.00
INFO: Cellpose2D-train: 2022-09-29 17:18:47,729 [INFO] computing flows for labels
INFO: Cellpose2D-train: 
INFO: Cellpose2D-train:   0%|          | 0/6 [00:00<?, ?it/s]
INFO: Cellpose2D-train:  17%|#6        | 1/6 [00:01<00:09,  1.90s/it]
INFO: Cellpose2D-train:  33%|###3      | 2/6 [00:03<00:06,  1.67s/it]
INFO: Cellpose2D-train:  50%|#####     | 3/6 [00:04<00:04,  1.47s/it]
INFO: Cellpose2D-train:  67%|######6   | 4/6 [00:05<00:02,  1.38s/it]
INFO: Cellpose2D-train:  83%|########3 | 5/6 [00:07<00:01,  1.48s/it]
INFO: Cellpose2D-train: 100%|##########| 6/6 [00:08<00:00,  1.31s/it]
INFO: Cellpose2D-train: 100%|##########| 6/6 [00:08<00:00,  1.42s/it]
INFO: Cellpose2D-train: 2022-09-29 17:18:56,579 [INFO] computing flows for labels
INFO: Cellpose2D-train: 
INFO: Cellpose2D-train:   0%|          | 0/2 [00:00<?, ?it/s]
INFO: Cellpose2D-train:  50%|#####     | 1/2 [00:01<00:01,  1.28s/it]
INFO: Cellpose2D-train: 100%|##########| 2/2 [00:02<00:00,  1.44s/it]
INFO: Cellpose2D-train: 100%|##########| 2/2 [00:02<00:00,  1.42s/it]
INFO: Cellpose2D-train: ggg (1000, 993) [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
INFO: Cellpose2D-train:  18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
INFO: Cellpose2D-train:  36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
INFO: Cellpose2D-train:  54. 55. 56. 57. 58. 59. 60. 61. 62.]
INFO: Cellpose2D-train: ggg (1000, 993) [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
INFO: Cellpose2D-train:  18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
INFO: Cellpose2D-train:  36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
INFO: Cellpose2D-train:  54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67.]
INFO: Cellpose2D-train: ggg (1000, 1000) [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
INFO: Cellpose2D-train:  18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
INFO: Cellpose2D-train:  36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
INFO: Cellpose2D-train:  54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67.]
INFO: Cellpose2D-train: ggg (1000, 993) [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
INFO: Cellpose2D-train:  18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
INFO: Cellpose2D-train:  36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
INFO: Cellpose2D-train:  54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66.]
INFO: Cellpose2D-train: ggg (1000, 993) [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
INFO: Cellpose2D-train:   14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
INFO: Cellpose2D-train:   28.  29.  30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.
INFO: Cellpose2D-train:   42.  43.  44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.
INFO: Cellpose2D-train:   56.  57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
INFO: Cellpose2D-train:   70.  71.  72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.
INFO: Cellpose2D-train:   84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
INFO: Cellpose2D-train:   98.  99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
INFO: Cellpose2D-train:  112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
INFO: Cellpose2D-train:  126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139.
INFO: Cellpose2D-train:  140. 141. 142. 143.]
INFO: Cellpose2D-train: ggg (1000, 993) [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
INFO: Cellpose2D-train:   14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
INFO: Cellpose2D-train:   28.  29.  30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.
INFO: Cellpose2D-train:   42.  43.  44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.
INFO: Cellpose2D-train:   56.  57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
INFO: Cellpose2D-train:   70.  71.  72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.
INFO: Cellpose2D-train:   84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
INFO: Cellpose2D-train:   98.  99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
INFO: Cellpose2D-train:  112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
INFO: Cellpose2D-train:  126. 127. 128. 129.]
INFO: Cellpose2D-train: 2022-09-29 17:18:59,778 [INFO] >>>> median diameter set to = 30
INFO: Cellpose2D-train: 2022-09-29 17:18:59,778 [INFO] >>>> training network with 6 channel input <<<<
INFO: Cellpose2D-train: 2022-09-29 17:18:59,778 [INFO] >>>> LR: 0.20000, batch_size: 8, weight_decay: 0.00001
INFO: Cellpose2D-train: 2022-09-29 17:18:59,778 [INFO] >>>> ntrain = 6, ntest = 2
INFO: Cellpose2D-train: 2022-09-29 17:18:59,780 [INFO] >>>> nimg_per_epoch = 6
INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D-train:     return _run_code(code, main_globals, None,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D-train:     exec(code, run_globals)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\__main__.py", line 504, in <module>
INFO: Cellpose2D-train:     main()
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\__main__.py", line 474, in main
INFO: Cellpose2D-train:     cpmodel_path = model.train(images, labels, train_files=image_names,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\models.py", line 1037, in train
INFO: Cellpose2D-train:     model_path = self._train_net(train_data, train_labels, 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\core.py", line 988, in _train_net
INFO: Cellpose2D-train:     train_loss = self._train_step(imgi, lbl)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\core.py", line 803, in _train_step
INFO: Cellpose2D-train:     y = self.net(X)[0]
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
INFO: Cellpose2D-train:     return forward_call(*input, **kwargs)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\parallel\data_parallel.py", line 166, in forward
INFO: Cellpose2D-train:     return self.module(*inputs[0], **kwargs[0])
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
INFO: Cellpose2D-train:     return forward_call(*input, **kwargs)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\resnet_torch.py", line 238, in forward
INFO: Cellpose2D-train:     T0 = self.downsample(data)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
INFO: Cellpose2D-train:     return forward_call(*input, **kwargs)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\resnet_torch.py", line 102, in forward
INFO: Cellpose2D-train:     xd.append(self.down[n](y))
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
INFO: Cellpose2D-train:     return forward_call(*input, **kwargs)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\cellpose\resnet_torch.py", line 55, in forward
INFO: Cellpose2D-train:     x = self.proj(x) + self.conv[1](self.conv[0](x))
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
INFO: Cellpose2D-train:     return forward_call(*input, **kwargs)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
INFO: Cellpose2D-train:     input = module(input)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
INFO: Cellpose2D-train:     return forward_call(*input, **kwargs)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\modules\batchnorm.py", line 168, in forward
INFO: Cellpose2D-train:     return F.batch_norm(
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose-v201\lib\site-packages\torch\nn\functional.py", line 2438, in batch_norm
INFO: Cellpose2D-train:     return torch.batch_norm(
INFO: Cellpose2D-train: RuntimeError: running_mean should contain 6 elements not 2

Training on a cellpose installation works though....

Any thoughts?

kevinjohncutler commented 2 years ago

@lacan Sorry for that, there was a bug in my Cellpose fork that only comes up with rescaling on. I believe I just fixed it, and will push the new version later today.

lacan commented 2 years ago

Thanks for the response! I wanted to try the new version but am still facing issues

I tried recreating a python environment with your update, but now am met with

INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose\lib\runpy.py", line 187, in _run_module_as_main
INFO: Cellpose2D-train:     mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose\lib\runpy.py", line 146, in _get_module_details
INFO: Cellpose2D-train:     return _get_module_details(pkg_main_name, error)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose\lib\runpy.py", line 110, in _get_module_details
INFO: Cellpose2D-train:     __import__(pkg_name)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose\lib\site-packages\cellpose\__init__.py", line 1, in <module>
INFO: Cellpose2D-train:     from . import core
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose\lib\site-packages\cellpose\core.py", line 12, in <module>
INFO: Cellpose2D-train:     from focal_loss.focal_loss import FocalLoss
INFO: Cellpose2D-train: ModuleNotFoundError: No module named 'focal_loss

Thing is, I can't even find where this FocalLoss depenency is coming from in cellpose... What version is it trying to download?

kevinjohncutler commented 2 years ago

@lacan you may have been dealing with the wrong version of cellpose. Omnipose requires my fork of cellpose, though that should install automatically. Try creating a new environment and downloading the latest pypi verison via pip install omnipose (or use --force in an existing environment).

lacan commented 2 years ago

@kevinjohncutler , thanks for the info. What I had was your version of omnipose, but it seems the cellpose dependency was wrong and that it was corrected.

In any case, now that I made a new environment with pip install omnipose (version 0.3.4), I am able to get farther in the training.

However, with the version of cellpose you are using, I cannot train my using my images. Computing the flows fails when using --omni `cellpose --train --dir "cellpose-training\train" --test_dir "cellpose-training\test" --pretrained_model cyto2_omni --chan 0 --chan2 0 --diameter 30.0 --n_epochs 900 --learning_rate 0.2 --batch_size 8 --omni --use_gpu Trace:

INFO: Cellpose2D-train: !NEW LOGGING SETUP! To see cellpose progress, set --verbose
INFO: Cellpose2D-train: No --verbose => no progress or info printed
INFO: Cellpose2D-train: 2022-10-25 16:49:15,214 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D-train: 2022-10-25 16:49:15,214 [INFO] >>>> using GPU
INFO: Cellpose2D-train: Omnipose enabled. See Omnipose repo for licencing details.
INFO: Cellpose2D-train: 2022-10-25 16:49:15,215 [INFO] Training omni model. Setting nclasses=4, RAdam=True
INFO: Cellpose2D-train: yoyoyoyoyggggggggg 4
INFO: Cellpose2D-train: 2022-10-25 16:49:15,310 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:49:15,372 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:49:15,407 [INFO] pretrained model C:\Users\oburri\.cellpose\models\cyto2_omnitorch_0 is being used
INFO: Cellpose2D-train: 2022-10-25 16:49:15,407 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D-train: 2022-10-25 16:49:16,753 [INFO] Training with rescale = 1.00
INFO: Cellpose2D-train: reshape_and_normalize_data_2 2 [0, 0] (6, 1, 222) (6, 1, 221)
INFO: Cellpose2D-train: 2022-10-25 16:49:16,771 [INFO] No precomuting flows with Omnipose. Computed during training.
INFO: Cellpose2D-train: 2022-10-25 16:49:16,775 [INFO] NOTE: computing flows for labels (could be done before to save time)
INFO: Cellpose2D-train: 
INFO: Cellpose2D-train:   0%|          | 0/2 [00:00<?, ?it/s]
INFO: Cellpose2D-train: 100%|##########| 2/2 [00:00<00:00, 12.12it/s]
INFO: Cellpose2D-train: 100%|##########| 2/2 [00:00<00:00, 12.05it/s]
INFO: Cellpose2D-train: 2022-10-25 16:49:16,975 [INFO] >>> Using RAdam optimizer
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> median diameter set to = 30
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> training network with 6 channel input <<<<
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> LR: 0.20000, batch_size: 8, weight_decay: 0.00001
INFO: Cellpose2D-train: 2022-10-25 16:49:17,168 [INFO] >>>> ntrain = 8, ntest = 2
INFO: Cellpose2D-train: 2022-10-25 16:49:17,170 [INFO] >>>> nimg_per_epoch = 8
INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D-train:     return _run_code(code, main_globals, None,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D-train:     exec(code, run_globals)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 507, in <module>
INFO: Cellpose2D-train:     main()
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 477, in main
INFO: Cellpose2D-train:     cpmodel_path = model.train(images, labels, train_files=image_names,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\models.py", line 1045, in train
INFO: Cellpose2D-train:     model_path = self._train_net(train_data, train_labels, 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\core.py", line 1014, in _train_net
INFO: Cellpose2D-train:     imgi, lbl, scale = transforms.random_rotate_and_resize(
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\transforms.py", line 839, in random_rotate_and_resize
INFO: Cellpose2D-train:     return omnipose.core.random_rotate_and_resize(X, Y=Y, scale_range=scale_range, gamma_range=gamma_range,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\omnipose\core.py", line 1387, in random_rotate_and_resize
INFO: Cellpose2D-train:     imgi[n], lbl[n], scale[n] = random_crop_warp(img, y, nt, tyx, nchan, scale[n], 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\omnipose\core.py", line 1499, in random_crop_warp
INFO: Cellpose2D-train:     lbl[k] = do_warp(l, M, tyx, offset=offset, order=0, mode=mode) # order 0 is 'nearest neighbor'
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\omnipose\core.py", line 1606, in do_warp
INFO: Cellpose2D-train:     return scipy.ndimage.affine_transform(A, np.linalg.inv(M), offset=offset, 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\scipy\ndimage\_interpolation.py", line 591, in affine_transform
INFO: Cellpose2D-train:     raise RuntimeError('affine matrix has wrong number of rows')
INFO: Cellpose2D-train: RuntimeError: affine matrix has wrong number of rows

And without --omni I get a different error cellpose --train --dir "cellpose-training\train" --test_dir "cellpose-training\test" --pretrained_model cyto2 --chan 0 --chan2 0 --diameter 30.0 --n_epochs 900 --learning_rate 0.2 --batch_size 8 --use_gpu

INFO: Cellpose2D-train: !NEW LOGGING SETUP! To see cellpose progress, set --verbose
INFO: Cellpose2D-train: No --verbose => no progress or info printed
INFO: Cellpose2D-train: 2022-10-25 16:51:25,492 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D-train: 2022-10-25 16:51:25,492 [INFO] >>>> using GPU
INFO: Cellpose2D-train: yoyoyoyoyggggggggg None
INFO: Cellpose2D-train: 2022-10-25 16:51:25,589 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:51:25,651 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D-train: 2022-10-25 16:51:25,690 [INFO] pretrained model C:\Users\oburri\.cellpose\models\cyto2torch_0 is being used
INFO: Cellpose2D-train: 2022-10-25 16:51:25,690 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D-train: Traceback (most recent call last):
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D-train:     return _run_code(code, main_globals, None,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D-train:     exec(code, run_globals)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 507, in <module>
INFO: Cellpose2D-train:     main()
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\__main__.py", line 456, in main
INFO: Cellpose2D-train:     model = models.CellposeModel(device=device,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\models.py", line 445, in __init__
INFO: Cellpose2D-train:     super().__init__(gpu=gpu, pretrained_model=False,
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\core.py", line 165, in __init__
INFO: Cellpose2D-train:     self.net = CPnet(self.nbase, 
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\resnet_torch.py", line 241, in __init__
INFO: Cellpose2D-train:     self.output = batchconv(nbaseup[0], nout, 1, self.dim)
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\cellpose\resnet_torch.py", line 35, in batchconv
INFO: Cellpose2D-train:     nn.Conv2d(in_channels, out_channels, sz, padding=sz//2),
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\torch\nn\modules\conv.py", line 444, in __init__
INFO: Cellpose2D-train:     super(Conv2d, self).__init__(
INFO: Cellpose2D-train:   File "F:\conda-envs\omnipose034\lib\site-packages\torch\nn\modules\conv.py", line 85, in __init__
INFO: Cellpose2D-train:     if out_channels % groups != 0:
INFO: Cellpose2D-train: TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'

I can share the training data with you privately in case you would like to see why this dataset would fail in omnipose but not in cellpose 2.0

kevinjohncutler commented 2 years ago

@lacan sorry for the delay, that would be great if you could share your training data so I can debug. You can reach me at kcutler@uw.edu. One thing that could be relevant is your torch/cuda versions.

lacan commented 1 year ago

Hello, So I waited a bit too long for this and have now retested things with version 0.4.4

I tried training this data cellpose-training.zip

with the command below and the error as well... Testing with other models gives different errors, but I wouild say one thing at a time :)

INFO: Executing command:
cmd.exe /C F:\conda-envs\cellpose-omnipose-biop-gpu\python.exe -W ignore -m omnipose --train --dir "C:\Users\oburri\Desktop\QuPath LuCa Demo Project\cellpose-training\train" --test_dir "C:\Users\oburri\Desktop\QuPath LuCa Demo Project\cellpose-training\test" --pretrained_model None --n_epochs 50 --learning_rate 0.2 --omni --cluster --batch_size 8 --use_gpu --verbose
INFO: This command should run directly if copy-pasted into your shell
INFO: Cellpose2D: 2023-05-16 16:08:53,927 [INFO] WRITING LOG OUTPUT TO C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: log file C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: 2023-05-16 16:08:54,090 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D: 2023-05-16 16:08:54,090 [INFO] >>>> using GPU
INFO: Cellpose2D: Omnipose enabled. See Omnipose repo for licencing details.
INFO: Cellpose2D: 2023-05-16 16:08:54,090 [INFO] Training omni model. Setting nclasses=4, RAdam=True
INFO: Cellpose2D: 2023-05-16 16:08:54,115 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-05-16 16:08:54,124 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-05-16 16:08:54,128 [INFO] training from scratch
INFO: Cellpose2D: 2023-05-16 16:08:54,128 [INFO] during training rescaling images to fixed diameter of 30.0 pixels
INFO: Cellpose2D: 2023-05-16 16:08:54,196 [INFO] Training with rescale = 1.00
INFO: Cellpose2D: reshape_train_test (187, 187) [0, 0] True True
INFO: Cellpose2D: reshape_and_normalize_data_2 2 [0, 0] (2, 187, 187) (2, 187, 187)
INFO: Cellpose2D: reshape_train_test_2 (2, 187, 187)
INFO: Cellpose2D: 2023-05-16 16:08:54,212 [INFO] No precomuting flows with Omnipose. Computed during training.
INFO: Cellpose2D: 2023-05-16 16:08:54,214 [INFO] NOTE: computing flows for labels (could be done before to save time)
INFO: Cellpose2D: 
INFO: Cellpose2D:   0%|          | 0/2 [00:00<?, ?it/s]
INFO: Cellpose2D:  50%|#####     | 1/2 [00:01<00:01,  1.37s/it]
INFO: Cellpose2D: 100%|##########| 2/2 [00:01<00:00,  1.59it/s]
INFO: Cellpose2D: 100%|##########| 2/2 [00:01<00:00,  1.35it/s]
INFO: Cellpose2D: Traceback (most recent call last):
INFO: Cellpose2D:   File "<frozen runpy>", line 198, in _run_module_as_main
INFO: Cellpose2D:   File "<frozen runpy>", line 88, in _run_code
INFO: Cellpose2D:   File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\omnipose\__main__.py", line 3, in <module>
INFO: Cellpose2D:     main(omni_CLI=True)
INFO: Cellpose2D:   File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\cellpose_omni\__main__.py", line 494, in main
INFO: Cellpose2D:     cpmodel_path = model.train(images, labels, links, train_files=image_names,
INFO: Cellpose2D:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: Cellpose2D:   File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\cellpose_omni\models.py", line 1138, in train
INFO: Cellpose2D:     test_labels = labels_to_flows(test_labels, test_links, files=test_files, use_gpu=self.gpu, device=self.device, dim=self.dim)
INFO: Cellpose2D:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INFO: Cellpose2D:   File "F:\conda-envs\cellpose-omnipose-biop-gpu\Lib\site-packages\omnipose\core.py", line 258, in labels_to_flows
INFO: Cellpose2D:     labels, dist, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu, 
INFO: Cellpose2D:     ^^^^^^^^^^^^^^^^^^^^^^^^
INFO: Cellpose2D: ValueError: too many values to unpack (expected 4)

m274d commented 1 year ago

I received the exactly same error as @lacan had. Tested with pip installation of omnipose 0.4.4 on python 3.8.12 with pytorch 1.11.0 and 3.10.8 with pytorch 1.13 Dataset: 3 images from the official omnipose dataset "bact_phase" downloaded from https://osf.io/xmury/

Training command:

python -m omnipose --train --use_gpu --check_mkl --verbose \
                     --dir ./data/cellpose_format/omnipose/train/bact_phase --test_dir ./data/cellpose_format/omnipose/test/bact_phase \
                     --img_filter _img --mask_filter _masks \
                     --chan 0 --chan2 0 \
                     --n_epochs 4000 --pretrained_model None \
                     --save_every 200 --save_each \
                     --learning_rate 0.1 --diameter 0 --batch_size 16  --RAdam

Error message for python 3.8.12


resnet_torch.py backend check. module 'torch.backends' has no attribute 'mps'
2023-07-24 21:33:15,422 [INFO] WRITING LOG OUTPUT TO /root/.cellpose/run.log
log file /root/.cellpose/run.log
2023-07-24 21:33:29,461 [INFO] ** TORCH GPU version installed and working. **
2023-07-24 21:33:29,462 [INFO] >>>> using GPU
Omnipose enabled. See Omnipose repo for licencing details.
2023-07-24 21:33:29,462 [INFO] Training omni model. Setting nclasses=4, RAdam=True
2023-07-24 21:33:29,500 [INFO] not all flows are present, will run flow generation for all images
2023-07-24 21:33:29,505 [INFO] not all flows are present, will run flow generation for all images
2023-07-24 21:33:29,508 [INFO] training from scratch
2023-07-24 21:33:29,508 [INFO] median diameter set to 0 => no rescaling during training
reshape_train_test (600, 600) [0, 0] True True
reshape_and_normalize_data_2 3 [0, 0] (2, 600, 600) (2, 600, 600)
reshape_train_test_2 (2, 600, 600)
2023-07-24 21:33:29,747 [INFO] No precomuting flows with Omnipose. Computed during training.
2023-07-24 21:33:29,756 [INFO] NOTE: computing flows for labels (could be done before to save time)
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00,  1.01s/it]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.8/site-packages/omnipose/__main__.py", line 3, in <module>
    main(omni_CLI=True)
  File "/opt/conda/lib/python3.8/site-packages/cellpose_omni/__main__.py", line 494, in main
    cpmodel_path = model.train(images, labels, links, train_files=image_names,
  File "/opt/conda/lib/python3.8/site-packages/cellpose_omni/models.py", line 1138, in train
    test_labels = labels_to_flows(test_labels, test_links, files=test_files, use_gpu=self.gpu, device=self.device, dim=self.dim)
  File "/opt/conda/lib/python3.8/site-packages/omnipose/core.py", line 258, in labels_to_flows
    labels, dist, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu, 
ValueError: too many values to unpack (expected 4)

Any hints would be appreciated!

lacan commented 1 year ago

Hi @kevinjohncutler. Any news on this? Should I try the bleeding edge github repo version?

kevinjohncutler commented 1 year ago

@lacan @m274d very sorry I didn't see any updates on this thread. I'm unable to reproduce these errors, but please do try the latest github version in a new environment. These errors have to do with installation (possibly due to having cellpose in the same env).

lacan commented 1 year ago

Hi @kevinjohncutler,

Thanks for the suggestion I did it again and unfortunetely no success

INFO: Executing command:
cmd.exe /C D:\conda\conda-envs\omnipose-github\python.exe -W ignore -m omnipose --train --dir "N:\public\alejandro.alonsocalleja_EDBB\MK_Segmentation_20230531\QuPath Cellpose Training Project - Omnipose test\cellpose-training\train" --test_dir "N:\public\alejandro.alonsocalleja_EDBB\MK_Segmentation_20230531\QuPath Cellpose Training Project - Omnipose test\cellpose-training\test" --pretrained_model None --min_train_masks 0 --omni --nchan 3 --use_gpu --verbose
INFO: This command should run directly if copy-pasted into your shell
INFO: Cellpose2D: 2023-09-14 14:51:50,950 [INFO] WRITING LOG OUTPUT TO C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: log file C:\Users\oburri\.cellpose\run.log
INFO: Cellpose2D: 2023-09-14 14:51:51,155 [INFO] ** TORCH GPU version installed and working. **
INFO: Cellpose2D: 2023-09-14 14:51:51,155 [INFO] >>>> using GPU
INFO: Cellpose2D: 2023-09-14 14:51:51,155 [INFO] Training omni model. Setting nclasses=2, RAdam=False
INFO: Cellpose2D: 2023-09-14 14:51:53,849 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-09-14 14:51:54,542 [INFO] not all flows are present, will run flow generation for all images
INFO: Cellpose2D: 2023-09-14 14:51:55,363 [INFO] channel axis detected at position 0, manually specify if incorrect
INFO: Cellpose2D: 2023-09-14 14:51:55,363 [INFO] training from scratch
INFO: Cellpose2D: 2023-09-14 14:51:55,363 [INFO] median diameter set to 0 => no rescaling during training
INFO: Cellpose2D: 2023-09-14 14:52:02,274 [INFO] No precomuting flows with Omnipose. Computed during training.
INFO: Cellpose2D: 2023-09-14 14:52:02,409 [INFO] NOTE: computing flows for labels (could be done before to save time)
INFO: Cellpose2D: 
INFO: Cellpose2D:   0%|          | 0/13 [00:00<?, ?it/s]
INFO: Cellpose2D:   8%|7         | 1/13 [00:00<00:03,  3.23it/s]
INFO: Cellpose2D:  15%|#5        | 2/13 [00:01<00:07,  1.42it/s]
INFO: Cellpose2D:  31%|###       | 4/13 [00:01<00:02,  3.22it/s]
INFO: Cellpose2D:  46%|####6     | 6/13 [00:01<00:01,  4.86it/s]
INFO: Cellpose2D:  62%|######1   | 8/13 [00:01<00:00,  6.08it/s]
INFO: Cellpose2D:  77%|#######6  | 10/13 [00:01<00:00,  8.13it/s]
INFO: Cellpose2D:  92%|#########2| 12/13 [00:02<00:00,  9.66it/s]
INFO: Cellpose2D: 100%|##########| 13/13 [00:02<00:00,  6.12it/s]
INFO: Cellpose2D: Traceback (most recent call last):
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\runpy.py", line 196, in _run_module_as_main
INFO: Cellpose2D:     return _run_code(code, main_globals, None,
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\runpy.py", line 86, in _run_code
INFO: Cellpose2D:     exec(code, run_globals)
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\__main__.py", line 12, in <module>
INFO: Cellpose2D:     main()
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\__main__.py", line 9, in main
INFO: Cellpose2D:     cellpose_omni_main(args)
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\site-packages\cellpose_omni\__main__.py", line 422, in main
INFO: Cellpose2D:     cpmodel_path = model.train(images, labels, links, train_files=image_names,
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\site-packages\cellpose_omni\models.py", line 1421, in train
INFO: Cellpose2D:     test_labels = labels_to_flows(test_labels, test_links, files=test_files, 
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\core.py", line 258, in labels_to_flows
INFO: Cellpose2D:     labels, dist, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu, 
INFO: Cellpose2D: ValueError: too many values to unpack (expected 4)

lacan commented 9 months ago

Hi @kevinjohncutler ,

I still cannot use the latest omnipose for training, got the same error Here is the data I am trying to use cellpose-training.zip

Is there anything else we can try? Happy to try and find some time to debug this over Zoom if this could help

Best Oli

lacan commented 9 months ago

So a little bit of further debugging...

It turns out training works with this command omnipose --train --dir "N:\temp-Oli\QuPath PCB\QuPath Cellpose training project\cellpose-training\train" --pretrained_model None --n_epochs 10 --omni --use_gpu --verbose

But causes the error shown above if we give it a test folder omnipose --train --dir "N:\temp-Oli\QuPath PCB\QuPath Cellpose training project\cellpose-training\train" --test_dir "N:\temp-Oli\QuPath PCB\QuPath Cellpose training project\cellpose-training\test" --pretrained_model None --n_epochs 10 --omni --use_gpu --verbose

So something seems to go wrong when trying to do something to the images in the test folder

Is this something only I am experiencing or is no one providing a test folder when training a model, which kind of scary actually....

lacan commented 7 months ago

@kevinjohncutler Any info on this? We are happily training data now, but cannot get any validation loss. Have you been able to reproduce the error with the data I provided?

lacan commented 7 months ago

Still no news from this... So I noticed the following by sifting through the code myself This line here https://github.com/kevinjohncutler/omnipose/blob/a585929215d4908a6279eb0503a0c46d02f4e2e9/omnipose/core.py#L263 Expects 4 outputs, but the method that it calls actually returns 5! https://github.com/kevinjohncutler/omnipose/blob/a585929215d4908a6279eb0503a0c46d02f4e2e9/omnipose/core.py#L401

I am guessing that boundaries is not necessary for this part of the code, so I simply modified Line 263 to read

        labels, dist, bounds, heat, veci = map(list,zip(*[masks_to_flows(labels[n], links=links[n], use_gpu=use_gpu, ...

and bounds just does not get used.

However this leads to the new error

INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\site-packages\omnipose\core.py", line 272, in <listcomp>
INFO: Cellpose2D:     flows = [np.concatenate((labels[n][np.newaxis,:,:], 
INFO: Cellpose2D:   File "<__array_function__ internals>", line 200, in concatenate
INFO: Cellpose2D:   File "D:\conda\conda-envs\omnipose-github\lib\site-packages\torch\_tensor.py", line 970, in __array__
INFO: Cellpose2D:     return self.numpy()
INFO: Cellpose2D: TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Am I right that this part then never really worked before? i.e. No validation data was used, so that this error was not caught? Otherwise is there something else to do? You mention often that flows are computed on the fly for omnipose, should that not be the case for the validation data too?

lacan commented 7 months ago

Some more info... I managed to get that last error to disappear and to work by replacing the lines here https://github.com/kevinjohncutler/omnipose/blob/a585929215d4908a6279eb0503a0c46d02f4e2e9/omnipose/core.py#L269

with

                                     veci[n].cpu(), 
                                     heat[n].cpu()[np.newaxis,:,:]), axis=0).astype(np.float32)

But there is a big issue in my eyes... How do I access the training and validation losses? In the output of your training there is only

INFO: Cellpose2D: 2024-04-08 17:30:18,398 [INFO] Train epoch: 32 | Time: 0.46min | last epoch: 0.74s | <sec/epoch>: 0.81s | <sec/batch>: 0.33s | <Batch Loss>: 3.527480 | <Epoch Loss>: 4.167647

In the original cellpose there is also a "Test loss" that we use to see how it went. I opened a separate issue to have this logged somewhere

kevinjohncutler commented 7 months ago

@lacan So sorry I didn't circle back to this, please feel free to email me moving forward with any urgent/outstanding issues.

For the issue of validation data during training: I honestly never worked with that code from cellpose, as validation loss has never been useful to me. I have a --save_each parameter instead to save intermediate models to allow testing after training to investigate how the model performs over the training epochs, and a validation curve can be made from that. When quantitative metrics are needed, I just care about the end model over the entire test dataset, and computing the loss on that at each epoch can really slow down training. In my experience, if the training and validation sets are big enough and representative of each other, there should be no difference in convergence either.

I have been meaning to build out a better way to log train and test loss, and when I get around to that, I will keep the validation loss in mind.

I should mention, the logs including "Cellpose2D" doesn't look familiar, so also be sure you are on the latest version and have explicitly uninstalled cellpose_omni.

lacan commented 7 months ago

the logs including "Cellpose2D" doesn't look familiar

yes that is because the output is from the cellpose QuPath extension, the name of the process is appended at the start of each line, this is why it looks weird.

Thanks for the information. Do you want me to add a PR over what I did? In its current state, omnipose will will always crash if a user defines --test_dir or do you want to remove that option all together? In which case I would suggest a section on how to validate an omnipose model in the documentation.

kevinjohncutler commented 7 months ago

@lacan Yes, please do submit a PR if you have made the edits required to get validation loss working. Thanks!

marieanselmet commented 5 months ago

Hello, I got exactly the same errors when defining a --test_dir to monitor some validation loss. I would be interested for a permanent fix too ! Thanks

kevinjohncutler / omnipose

Training fails in Omnipose but works in cellpose #16