Updating pip package and compatibility with ZeroCostDL4Mic/DL4MicEverywhere

esgomezm commented 10 months ago

Hi there!

I'm updating the Embedseg notebook in ZeroCostDL4Mic so we can also use it in DL4MicEverywhere with Docker containers. For the containerisation in Mac we are having some issues that could be solved with some feature changes directly in EmbedSeg. We were wondering if the followings are easy or doable for you to deploy:

In the setup.py of embedseg, imagecodecs is listed as a dependency. Do you think it is be possible to remove it from the basic installation and leave it as an extra requirement for the environment?

Also, could it be possible to update the pip package with the new version of Embedseg? When installing it from pip there are multiple issues with old versions of numpy, however, when installing it directly from your repo, things work nicely.

Thank you! Sincerely,

Esti

jdeschamps commented 10 months ago

We will give it a try, in the meantime I asked Manan to give us access to the PyPi project (so that we can use the more modern trusted publisher process).

esgomezm commented 10 months ago

We will give it a try, in the meantime I asked Manan to give us access to the PyPi project (so that we can use the more modern trusted publisher process).

Thank you! I really really like this method and would be super cool to have it running through the notebooks

esgomezm commented 7 months ago

Hi there! I'm still going with further dependency issues to use the package. With the current version of NumPy (1.26), the library does not work as there are some depricated np.float around from numpy 1.20.0

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[16], line 109
    105 one_hot = False
    107 print('Analysing your dataset, this will take a few minutes... maybe time for ☕ & 🍪')
--> 109 data_properties_dir = get_data_properties(data_dir, model_name, train_val_name=['train'],
    110                                           test_name=['test'], mode='2d', one_hot=one_hot)
    112 # Here we check the type of data to choose the correct normalisation
    113 # Check this works with multichannel
    115 maxElement = np.amax(x)

File /usr/local/lib/python3.10/dist-packages/EmbedSeg/utils/preprocess_data.py:472, in get_data_properties(data_dir, project_name, train_val_name, test_name, mode, one_hot)
    469 data_properties_dir = {}
    470 data_properties_dir['foreground_weight'] = calculate_foreground_weight(data_dir, project_name, train_val_name, mode,
    471                                                                        one_hot)
--> 472 data_properties_dir['min_object_size'] = calculate_min_object_size(data_dir, project_name, train_val_name, mode,
    473                                                                    one_hot).astype(np.float)
    474 data_properties_dir['n_z'], data_properties_dir['n_y'], data_properties_dir['n_x'] = calculate_max_eval_image_size(
    475     data_dir, project_name, test_name, mode, one_hot)
    476 data_properties_dir['one_hot'] = one_hot

File /usr/local/lib/python3.10/dist-packages/EmbedSeg/utils/preprocess_data.py:352, in calculate_min_object_size(data_dir, project_name, train_val_name, mode, one_hot)
    350             size_list.append(len(z))
    351 print("Minimum object size of the `{}` dataset is equal to {}".format(project_name, np.min(size_list)))
--> 352 return np.min(size_list).astype(np.float)

File /usr/local/lib/python3.10/dist-packages/numpy/__init__.py:324, in __getattr__(attr)
    319     warnings.warn(
    320         f"In the future `np.{attr}` will be defined as the "
    321         "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
    323 if attr in __former_attrs__:
--> 324     raise AttributeError(__former_attrs__[attr])
    326 if attr == 'testing':
    327     import numpy.testing as testing

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

jdeschamps commented 7 months ago

Hi @esgomezm !

Did you install the conda env using the environment.yml file? I see that it pins all the dependencies, so I am not surprised it that would fail. We should remove that...

I tried this one and I could run one of the examples (at least the function you have an issue with, and the training is ongoing):

name: embedseg
channels:
  - pytorch
  - nvidia
  - defaults
dependencies:
  - python=3.9
  - pytorch
  - torchvision
  - pytorch-cuda=11.8
  - pip
  - pip:
      - git+https://github.com/juglab/EmbedSeg.git

Note that I have access to an A40-8Q, and a different GPU might require a different pytorch-cuda version (e.g. 11.2). Let me know if this works!

Also, if the notebook your are using is based on a similar example from this repo, let me know, I can test it.

esgomezm commented 7 months ago

Hi @jdeschamps

No, I was installing embedseg directly using pip install. But the issue remains with the setup.py in github, as it doesn't freeze any version. The issue is that EmbedSeg was coded for numpy 1.20, while other dependencies such as astropy, retired their older versions and the new ones (astropy>5) are not compatible with numpy 1.20. This is the reason why I asked for a very specific and frozen environment where the code is working.

In any case, I found a way around by installing the dependencies in a particular order and things seem to work, at least inside dockers (https://github.com/HenriquesLab/ZeroCostDL4Mic/blob/master/requirements_files/EmbedSeg_2D_requirements_simple.txt)

I hope this helps Esti

jdeschamps commented 7 months ago

Best would then to make EmbedSeg compatible with more recent numpy then (especially if the error is just np.dtypes, that's easy to fix). I'll look into that.

esgomezm commented 7 months ago

that would be great. There are issues with np.dtypes() and also using np.float() etc.

jdeschamps commented 7 months ago

Actually, it is probably because your env still use an old version of EmbedSeg, because the numpy issues have been fixed in 0.2.5. (my env this morning was using numpy 1.26).

Can your CI/pipeline be built by pulling directly the latest EmbedSeg from git? e.g.:

pip install git+https://github.com/juglab/EmbedSeg.git

In the meantime, I am waiting to get full access to the PyPi package, so that the CI can push the latest version automatically...

juglab / EmbedSeg

Updating pip package and compatibility with ZeroCostDL4Mic/DL4MicEverywhere #32