Merge MPI Dortmund version into this main repository

thorstenwagner commented 2 years ago

Before it gets merged we should test the installation procedure and if its backwards compatible. See issue https://github.com/juglab/cryoCARE_pip/issues/17 .

tibuch commented 2 years ago

Thank you very much for suggesting this! I think it makes a lot of sense. Should the other PRs be merged beforehand?

I will try out the installation. Hopefully everything works :slightly_smiling_face:

thorstenwagner commented 2 years ago

Thank you very much for suggesting this! I think it makes a lot of sense. Should the other PRs be merged beforehand?

We can, but its already fixed in the MPIDO version.

I will try out the installation. Hopefully everything works slightly_smiling_face

Great! Once you checked it and give your OK, I gonna merge everything :-)

thorstenwagner commented 2 years ago

Hmpf. My cuda 11 setup is not working. Well, it works for the prediction. But not for training :-(

tibuch commented 2 years ago

I will try to install it now and see if training works.

tibuch commented 2 years ago

We should update the csbdeep version requirements from "csbdeep>=0.6.0,<0.7.0" to "csbdeep>=0.7.0,<0.8.0". According to the release notes of csbdeep only old Python versions where dropped.

So far I have done the following to create a new env:

conda create -n cryoCARE_11 python=3.8
conda activate cryoCARE_11
conda install cudatoolkit=11.0 cudnn=8.0 -c conda-forge
pip install tensorflow==2.4
pip install csbdeep

Could you update the csbdeep version requirements in your setup.py? Then I can directly install from your main branch and see if everything works.

Thanks!

thorstenwagner commented 2 years ago

done

tibuch commented 2 years ago

Training is running with the install instructions I described above. @thorstenwagner could you try them out as well?

thorstenwagner commented 2 years ago

Thanks a lot for all the great work on cryoCARE! This is awesome!

To me everything looks good. I am currently running all scripts and will ping you once they all finished.

The only things we should update are the README.md and the setup.py to reflect our collaboration. Can we do this directly on this PR or would you prefer to do it once it is merged?

Well, github already mentiones me as collaborator :-) Maybe just need to mention what changed in the new version. We can do it directly in this PR

tibuch commented 2 years ago

While running cryoCARE_predict.py I get the following error:

Traceback (most recent call last):
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 13, in <module>
    from numpy.typing import NDArray
ModuleNotFoundError: No module named 'numpy.typing'

numpy.typing was introduced with numpy v1.20. However, tensorflow 2.4 requires numpy < 1.20. Should we remove the typing or try to find a setup with a newer tensorflow?

tibuch commented 2 years ago

While running cryoCARE_predict.py I get the following error:
Traceback (most recent call last):
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 13, in <module>
    from numpy.typing import NDArray
ModuleNotFoundError: No module named 'numpy.typing'
numpy.typing was introduced with numpy v1.20. However, tensorflow 2.4 requires numpy < 1.20. Should we remove the typing or try to find a setup with a newer tensorflow?

I looked for a tensorflow that supports numpy 1.20. The earliest version is tensorflow 2.7. But it comes with h5py 3.1.0, which is in conflict with csbdeep 0.7.2 which requires h5py<3. Based on that I would vote to remove the typing and add it at a later point. What to do you think?

thorstenwagner commented 2 years ago

While running cryoCARE_predict.py I get the following error:
Traceback (most recent call last):
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 13, in <module>
    from numpy.typing import NDArray
ModuleNotFoundError: No module named 'numpy.typing'
numpy.typing was introduced with numpy v1.20. However, tensorflow 2.4 requires numpy < 1.20. Should we remove the typing or try to find a setup with a newer tensorflow?
I looked for a tensorflow that supports numpy 1.20. The earliest version is tensorflow 2.7. But it comes with h5py 3.1.0, which is in conflict with csbdeep 0.7.2 which requires h5py<3. Based on that I would vote to remove the typing and add it at a later point. What to do you think?

In the latest version I removed NDArray and replaced it with 'np.array' as type hint

thorstenwagner commented 2 years ago

The training works, but I see tons of warnings. Do you see the same?

WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43a9d0> and will run it as-is.
Cause: could not parse the source code of <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43a9d0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names.
Match 0:
(lambda x: K.mean(x, axis=(- 1)))

Match 1:
(lambda x: x)

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43adc0> and will run it as-is.
Cause: could not parse the source code of <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43adc0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names.
Match 0:
(lambda x: K.mean(x, axis=(- 1)))

Match 1:
(lambda x: x)

tibuch commented 2 years ago

The training works, but I see tons of warnings. Do you see the same?

WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43a9d0> and will run it as-is.
Cause: could not parse the source code of <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43a9d0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names.
Match 0:
(lambda x: K.mean(x, axis=(- 1)))

Match 1:
(lambda x: x)

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43adc0> and will run it as-is.
Cause: could not parse the source code of <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43adc0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names.
Match 0:
(lambda x: K.mean(x, axis=(- 1)))

Match 1:
(lambda x: x)

Yes, I got the same messages. I opted to ignore them for now.

tibuch commented 2 years ago

The predict_config.json example is missing the "output_name" key-word in the REAMDE.md.

And I get the following error for tiled prediction:

  0%|▏                                                                          | 1/544 [00:00<00:00, 131072.00it/s]Traceback (most recent call last):
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 132, in <module>
    main()
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 126, in main
    denoise(config, mean, std, even=config['even'], odd=config['odd'], output_file=join(config['path'], config['output_name']))
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 52, in denoise
    model.predict(even_vol, odd_vol, denoised, axes='ZYXC', normalizer=None, mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 106, in predict
    self._predict_mean_and_scale(self._crop(even), self._crop(odd), self._crop(output), axes, normalizer, resizer=NoResizer(), mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 213, in _predict_mean_and_scale
    pred = predict_tiled(self.keras_model, even, odd, output, [4 * (slice(None),)], 4 * (slice(None),),
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 257, in predict_tiled
    output[:] = pred[:]
ValueError: could not broadcast input array from shape (128,168,104,1) into shape (128,168,152,1)
  0%|▏                                                                              | 1/544 [00:00<01:01,  8.80it/s]

Do you have an idea why this is happening?

thorstenwagner commented 2 years ago

Alright, fine with

The training works, but I see tons of warnings. Do you see the same?

WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43a9d0> and will run it as-is.
Cause: could not parse the source code of <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43a9d0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names.
Match 0:
(lambda x: K.mean(x, axis=(- 1)))

Match 1:
(lambda x: x)

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43adc0> and will run it as-is.
Cause: could not parse the source code of <function _mean_or_not.<locals>.<lambda> at 0x7efb2c43adc0>: found multiple definitions with identical signatures at the location. This error may be avoided by defining each lambda on a single line and with unique argument names.
Match 0:
(lambda x: K.mean(x, axis=(- 1)))

Match 1:
(lambda x: x)

Yes, I got the same messages. I opted to ignore them for now.

import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)

This would get rid of the warnings. Fine for you?

thorstenwagner commented 2 years ago

The predict_config.json example is missing the "output_name" key-word in the REAMDE.md.

Hm, actually, it should not be needed anymore. At least in my examples I don't use it anymore. Can it be that your config is 'outdated' in the way that you don't specify the tar.gz as 'path'?

thorstenwagner commented 2 years ago

And I get the following error for tiled prediction:

  0%|▏                                                                          | 1/544 [00:00<00:00, 131072.00it/s]Traceback (most recent call last):
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 132, in <module>
    main()
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 126, in main
    denoise(config, mean, std, even=config['even'], odd=config['odd'], output_file=join(config['path'], config['output_name']))
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 52, in denoise
    model.predict(even_vol, odd_vol, denoised, axes='ZYXC', normalizer=None, mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 106, in predict
    self._predict_mean_and_scale(self._crop(even), self._crop(odd), self._crop(output), axes, normalizer, resizer=NoResizer(), mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 213, in _predict_mean_and_scale
    pred = predict_tiled(self.keras_model, even, odd, output, [4 * (slice(None),)], 4 * (slice(None),),
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 257, in predict_tiled
    output[:] = pred[:]
ValueError: could not broadcast input array from shape (128,168,104,1) into shape (128,168,152,1)
  0%|▏                                                                              | 1/544 [00:00<01:01,  8.80it/s]

Do you have an idea why this is happening?

Hm, so I prepared an example with a quite small volume so that it fits on my GPU memory to make sure that it actually uses [1,1,1] for tiling and this never happened.

tibuch commented 2 years ago

The predict_config.json example is missing the "output_name" key-word in the REAMDE.md.

Hm, actually, it should not be needed anymore. At least in my examples I don't use it anymore. Can it be that your config is 'outdated' in the way that you don't specify the tar.gz as 'path'?

Oh! Yes, I mixed old and new config. Now it works :partying_face:

tibuch commented 2 years ago

And I get the following error for tiled prediction:

  0%|▏                                                                          | 1/544 [00:00<00:00, 131072.00it/s]Traceback (most recent call last):
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 132, in <module>
    main()
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 126, in main
    denoise(config, mean, std, even=config['even'], odd=config['odd'], output_file=join(config['path'], config['output_name']))
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 52, in denoise
    model.predict(even_vol, odd_vol, denoised, axes='ZYXC', normalizer=None, mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 106, in predict
    self._predict_mean_and_scale(self._crop(even), self._crop(odd), self._crop(output), axes, normalizer, resizer=NoResizer(), mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 213, in _predict_mean_and_scale
    pred = predict_tiled(self.keras_model, even, odd, output, [4 * (slice(None),)], 4 * (slice(None),),
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 257, in predict_tiled
    output[:] = pred[:]
ValueError: could not broadcast input array from shape (128,168,104,1) into shape (128,168,152,1)
  0%|▏                                                                              | 1/544 [00:00<01:01,  8.80it/s]

Do you have an idea why this is happening?

Hm, so I prepared an example with a quite small volume so that it fits on my GPU memory to make sure that it actually uses [1,1,1] for tiling and this never happened.

This happens for me if I have a volume that requires tiling. I think the tiling recursively goes down until it uses that part of the code. But for tiled prediction it has to crop it before moving it into the output array.

I also saw this line which should replace output with the return value of the prediction in the case where tiling is [1,1,1]. Which should achieve the same as the fix we added here.

thorstenwagner commented 2 years ago

The predict_config.json example is missing the "output_name" key-word in the REAMDE.md.

Hm, actually, it should not be needed anymore. At least in my examples I don't use it anymore. Can it be that your config is 'outdated' in the way that you don't specify the tar.gz as 'path'?

Oh! Yes, I mixed old and new config. Now it works partying_face

Hm, to make sure that this confusion is not happening again I would recommend this:

        # Fall back to original cryoCARE implmentation
        print("Your config is not in the format that cryoCARE >=0.2 requires. Fallback to cryCARE 0.1 format.")
        if 'output_name' not in config or os.path.isfile(config['path']):
            print("Invalid config format.")
            sys.exit(1)

        dm = CryoCARE_DataModule()
        dm.load(config['path'])
        mean, std = dm.train_dataset.mean, dm.train_dataset.std

        denoise(config, mean, std, even=config['even'], odd=config['odd'], output_file=join(config['path'], config['output_name']))

thorstenwagner commented 2 years ago

And I get the following error for tiled prediction:

  0%|▏                                                                          | 1/544 [00:00<00:00, 131072.00it/s]Traceback (most recent call last):
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 132, in <module>
    main()
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 126, in main
    denoise(config, mean, std, even=config['even'], odd=config['odd'], output_file=join(config['path'], config['output_name']))
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/bin/cryoCARE_predict.py", line 52, in denoise
    model.predict(even_vol, odd_vol, denoised, axes='ZYXC', normalizer=None, mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 106, in predict
    self._predict_mean_and_scale(self._crop(even), self._crop(odd), self._crop(output), axes, normalizer, resizer=NoResizer(), mean=mean, std=std,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 213, in _predict_mean_and_scale
    pred = predict_tiled(self.keras_model, even, odd, output, [4 * (slice(None),)], 4 * (slice(None),),
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 309, in predict_tiled
    pred = predict_tiled(keras_model, even_tile, odd_tile, output_tile[s_src_out[-1]], s_src_out + [s_src], s_dst,
  File "/home/tibuch/Apps/miniconda3/envs/cryoCARE_11/lib/python3.8/site-packages/cryocare/internals/CryoCARE.py", line 257, in predict_tiled
    output[:] = pred[:]
ValueError: could not broadcast input array from shape (128,168,104,1) into shape (128,168,152,1)
  0%|▏                                                                              | 1/544 [00:00<01:01,  8.80it/s]

Do you have an idea why this is happening?

Hm, so I prepared an example with a quite small volume so that it fits on my GPU memory to make sure that it actually uses [1,1,1] for tiling and this never happened.

This happens for me if I have a volume that requires tiling. I think the tiling recursively goes down until it uses that part of the code. But for tiled prediction it has to crop it before moving it into the output array.

I also saw this line which should replace output with the return value of the prediction in the case where tiling is [1,1,1]. Which should achieve the same as the fix we added here.

So you suggest to simply remove the following line?

output[:] = pred[:]

tibuch commented 2 years ago

If I remove this tiled prediction works.

I try to find a fix for the case without tiling.

thorstenwagner commented 2 years ago

If I remove this tiled prediction works.

I try to find a fix for the case without tiling.

OK, I did the same. One workaround would be to enforce tiling ^^ But a fix would be clearly better ..

thorstenwagner commented 2 years ago

I also tested the cuda 10 instructions, and they also work with the current setup.py and csbdeep :partying_face:

thorstenwagner commented 2 years ago

If I remove this tiled prediction works. I try to find a fix for the case without tiling.

OK, I did the same. One workaround would be to enforce tiling ^^ But a fix would be clearly better ..

Ah, one easy fix would be to only broadcast them if the shapes are equal...

like

if output.shape == pred.shape:
   output[:] = pred[:]

The assumption would be only fulfilled for no-tiling. What do you think?

Quickly tested it and it works for my examples.

tibuch commented 2 years ago

If I remove this tiled prediction works. I try to find a fix for the case without tiling.

OK, I did the same. One workaround would be to enforce tiling ^^ But a fix would be clearly better ..

Ah, one easy fix would be to only broadcast them if the shapes are equal...

like
if output.shape == pred.shape:
   output[:] = pred[:]
The assumption would be only fulfilled for no-tiling. What do you think?

Quickly tested it and it works for my examples.

Looks reasonable and works for my examples as well :+1:

tibuch commented 2 years ago

Looks like we still have some merge conflicts. Could you rebase and just take all the changes from this version here?

thorstenwagner commented 2 years ago

Looks like we still have some merge conflicts. Could you rebase and just take all the changes from this version here?

Done. Merge? :-)

After merge, would you create a new pypi release?

tibuch commented 2 years ago

Yes! :shipit:

I will create a new pypi release and then we should celebrate with a collaborative tweet :wink:

juglab / cryoCARE_pip

Merge MPI Dortmund version into this main repository #18