tristandeleu / pytorch-meta

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch
https://tristandeleu.github.io/pytorch-meta/
MIT License
1.97k stars 256 forks source link

why can't we use pytorch 1.9.0? #140

Closed brando90 closed 2 years ago

brando90 commented 2 years ago

https://github.com/tristandeleu/pytorch-meta/blob/d487ad0a1268bd6e6a7290b8780c6b62c7bed688/setup.py#L34

brando90 commented 2 years ago

This is puzzling. In my local computer I can install torchmeta and pytorch 1.9.x but it seems I have issues with the cluster only (C02YQ0GQLVCJ is my local machine):

(base) brando~ ❯ hostname
C02YQ0GQLVCJ
(base) brando~ ❯ conda list | grep torch
pytorch                   1.9.0           cpu_py39h28f9090_2    conda-forge
pytorch-cpu               1.9.0           cpu_py39he781eb1_2    conda-forge
torch                     1.9.0                    pypi_0    pypi
torchmeta                 1.7.0                    pypi_0    pypi
torchtext                 0.10.0                   pypi_0    pypi
torchvision               0.10.0                   pypi_0    pypi
torchviz                  0.0.2                    pypi_0    pypi
(base) brando~ ❯ pip list | grep torch  
torch                   1.9.0.post2
torchmeta               1.7.0
torchtext               0.10.0
torchvision             0.10.0
torchviz                0.0.2

Any ideas? Also, the setup.py suggests it should not work with my local computer either but it seems it does @tristandeleu

brando90 commented 2 years ago

btw, this line makes it hard to debug:

https://github.com/tristandeleu/pytorch-meta/blob/d487ad0a1268bd6e6a7290b8780c6b62c7bed688/setup.py#L15

I can't see the version.


https://github.com/tristandeleu/pytorch-meta/blob/d487ad0a1268bd6e6a7290b8780c6b62c7bed688/torchmeta/version.py#L1

if it's hard coded why hardcode it in a file?

brando90 commented 2 years ago

I wonder if the reason it worked on my local computer is because I might have accidentally (my guess) done this:

  1. installed torchmeta
  2. somehow installed enough versions of pytorch (somehow without conda or pip caring about depedencies of other libraries) until it worked.

I didn't do this deliberately but that is why it might have worked. Will try something like this on the cluster, probably cpu first.

tristandeleu commented 2 years ago

The reason why the version of PyTorch is limited is to be a be conservative: since the MetaModule is a carbon copy of PyTorch modules (with a functional API), then we need to be sure that they still match, even after PyTorch's updates, in case there was an update to nn in a new version of PyTorch. I have tried to update it at every minor update of PyTorch (e.g. going from 1.8.x to 1.9.x), but I have not done it for PyTorch 1.9; apologies for that, that will be fixed.

That being said, this is probably a bit too conservative, and PyTorch never really changes the nn module (there hasn't been any breaking change since PyTorch 1.3). You could use --no-dependencies when installing Torchmeta to install it with PyTorch 1.9, and it should work (although there is no strict guarantee).

if it's hard coded why hardcode it in a file?

This is to have a single source of truth concerning the version of Torchmeta (see point 3). Another convenient thing is that I have an integration that automatically pushes the package to Pypi anytime the version changes.

brando90 commented 2 years ago

ok I will try to install it with the no dependency flag and then manually install the dependency and the right version of pytorch. I think this should work for cpu version stuff since my local computer works. Will try with GPU in a bit and update here.

For now here is the install script I wrote for this:

## Installation script
# to install do: bash ~/automl-meta-learning/install.sh

#conda update conda

#conda create -y -n metalearning_gpu python=3.9
#conda activate metalearning_gpu
#conda remove --name metalearning_gpu --all

module load cuda-toolkit/11.1
module load gcc/9.2.0

# A40, needs cuda at least 11.0
#conda install -y pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch -c nvidia
#conda install -y pytorch==1.9 torchvision torchaudio cudatoolkit=11.0 -c pytorch -c nvidia
#conda install -y pytorch==1.9 torchvision cudatoolkit=11.0 -c pytorch -c nvidia
#conda install -y pytorch torchvision cudatoolkit=11.0 -c pytorch -c nvidia

conda activate metalearning_gpu
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
#pip3 install torch==1.9.0+cu110 torchvision==0.10.0+cu110 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

#conda activate metalearning_cpu
#conda install pytorch torchvision torchaudio cpuonly -c pytorch
#pip3 install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

# uutils installs
conda install -y dill
conda install -y networkx>=2.5
conda install -y scipy
conda install -y scikit-learn
conda install -y lark-parser -c conda-forge

# due to compatibility with torch=1.7.x, https://stackoverflow.com/questions/65575871/torchtext-importerror-in-colab
#conda install -y torchtext==0.8.0 -c pytorch

conda install -y tensorboard
conda install -y pandas
conda install -y progressbar2
conda install -y transformers
conda install -y requests
conda install -y aiohttp
conda install -y numpy
conda install -y plotly
conda install -y matplotlib

pip install wandb

# for automl
conda install -y pyyml
conda install -y torchviz
#conda install -y graphviz

#pip install tensorflow
#pip install learn2learn

#pip install -U git+https://github.com/brando90/pytorch-meta.git
#pip install --no-deps torchmeta==1.6.1
pip install --no-deps torchmeta==1.7.0
#        'torch>=1.4.0,<1.9.0',
#        'torchvision>=0.5.0,<0.10.0',
#pip install -y numpy
pip install Pillow
pip install h5py
#pip install requests
pip install ordered-set

pip install higher
#    'torch'

#pip install -U git+https://github.com/moskomule/anatome
pip install --no-deps -U git+https://github.com/moskomule/anatome
#    'torch>=1.9.0',
#    'torchvision>=0.10.0',
pip install tqdm

# - using conda develop rather than pip because uutils installs incompatible versions with the vision cluster
## python -c "import sys; [print(p) for p in sys.path]"
conda install conda-build
conda develop ~/ultimate-utils/ultimate-utils-proj-src
conda develop ~/automl-meta-learning/automl-proj-src

# -- extra notes

# local editable installs
# HAL installs, make sure to clone from wmlce 1.7.0 that has h5py ~= 2.9.0 and torch 1.3.1 and torchvision 0.4.2
# pip install torchmeta==1.3.1
tristandeleu commented 2 years ago

FYI I released a new version of Torchmeta (1.8.0) that is compatible with PyTorch 1.9.

Case in point about being conservative: a couple of things broke with PyTorch 1.9 and needed an update in Torchmeta, specifically ConvNd and MultiheadAttention.

brando90 commented 2 years ago

FYI I released a new version of Torchmeta (1.8.0) that is compatible with PyTorch 1.9.

Case in point about being conservative: a couple of things broke with PyTorch 1.9 and needed an update in Torchmeta, specifically ConvNd and MultiheadAttention.

Thanks Trist!

Btw, did you install stuff with conda or pip? (to save me time cuz installs in my hpc take ages)

tristandeleu commented 2 years ago

Btw, did you install stuff with conda or pip? (to save me time cuz installs in my hpc take ages)

I'm generally using pip, even on my conda environment on a remote server.

tristandeleu commented 2 years ago

I'm closing this issue since the newest version of Torchmeta (1.8.0) works with PyTorch 1.9.