Package Versions - Githubissues

ErlerPhilipp commented 2 years ago

Hi,

I tried to reproduce your results, but I ran into a possible version mismatch between Pytorch and Pytorch_geometric.

I created my environment with the following commands:

conda create --name poco python=3.7.10
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=11.1 -c pytorch -c conda-forge
conda install -c conda-forge cython
conda install -c conda-forge tqdm 
conda install -c conda-forge scikit-image 
conda install -c open3d-admin open3d 
conda install -c conda-forge scikit-learn 
conda install -c conda-forge pyyaml 
conda install -c conda-forge addict 
conda install -c conda-forge pandas 
conda install -c conda-forge plyfile 
conda install -c conda-forge pytorch_geometric

Compilation with python setup.py build_ext --inplace seems to work but python generate.py --config results/ABC_10k_FKAConv_InterpAttentionKHeadsNet_None/config.yaml --dataset_name DATASET_NAME --dataset_root data/3d_shapes_abc/abc/ --gen_resolution_global 256 results in OSError: /home/perler/miniconda3/envs/poco/lib/python3.7/site-packages/torch_sparse/_version.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Installed versions are:

(poco) perler@BOTTLE:~/repos/poco$ conda list pytorch
# packages in environment at /home/perler/miniconda3/envs/poco:
#
# Name                    Version                   Build  Channel
pytorch                   1.8.1           py3.7_cuda11.1_cudnn8.0.5_0    pytorch
pytorch-cpu               1.1.0               py3.7_cpu_0    pytorch
pytorch_geometric         2.0.3              pyh6c4a22f_0    conda-forge
pytorch_sparse            0.6.4            py37hcae2be3_0    conda-forge

Again, the CPU-version... but that's a different issue.

AFAIK, they added sparse tensors only recently to Pytorch, so the installed Pytorch-geometric version might be too new. Which version of Pytorch-geometric do I need?

Can you please create a requiremtents.txt and/or environment.yaml?

ErlerPhilipp commented 2 years ago

After half a day reverse-engineering the requirements, I got this poco.yaml:

name: poco
channels:
  - pytorch
  - pyg
  - open3d-admin
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - python=3.7.10
  - pytorch::pytorch=1.8.1
  - pytorch::torchvision=0.9.1
  - pytorch::torchaudio=0.8.1
  - cudatoolkit=11.1
  - cython
  - tqdm
  - scikit-image
  - open3d
  - scikit-learn
  - pyyaml
  - addict
  - pandas
  - plyfile
  - pyg=2.0.1
  - pip
  - pip:
    - open3d
    - trimesh

Seems to work with setup.py and generate.py. Hope this helps someone.

aboulch commented 2 years ago

Thanks a lot. I have added it in the repo for a conda installation.

ErlerPhilipp commented 2 years ago

@aboulch in my case, the generate script is pretty slow with ~20 min per object. is this normal? should this be multi-threaded by default?

aboulch commented 2 years ago

On scenes yes, it is quite slow. However on shapenet objects, it should take around 10->20s per object depending on your hardware. In my case it was a 6 cpu threads on a Intel(R) Xeon(R) CPU E5-2630 and a 2080ti GPU.

aboulch commented 2 years ago

I could reproduce the issue with the proposed yml file. I will look into that.

ErlerPhilipp commented 2 years ago

Thanks!

I'm trying to reproduce the ABC dataset first. my GPU is almost idle and only one CPU core is occupied. could this be related to OpenMP?

Btw. why do you compile Pykdtree? It's also available as simple conda package.

aboulch commented 2 years ago

First install the packages:

apt-get install libgl1-mesa-glx libopenblas-dev

--> the problem may come from the openblas missing

Create a minimal conda environment (if needed, it only installs python, cudatoolkit and pip):

conda env create -f environment.yml
conda activate poco

Installing dependencies with pip

pip install -r requirements.txt

Build the compiled library (needed only for evaluation)

python setup.py build_ext --inplace

Note: I will remove the dependency to the compiled pykdtree.

zhaoyuanyuan2011 commented 2 years ago

Hi,

I tried to reproduce your results, but I ran into a possible version mismatch between Pytorch and Pytorch_geometric.

I created my environment with the following commands:
conda create --name poco python=3.7.10
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=11.1 -c pytorch -c conda-forge
conda install -c conda-forge cython
conda install -c conda-forge tqdm 
conda install -c conda-forge scikit-image 
conda install -c open3d-admin open3d 
conda install -c conda-forge scikit-learn 
conda install -c conda-forge pyyaml 
conda install -c conda-forge addict 
conda install -c conda-forge pandas 
conda install -c conda-forge plyfile 
conda install -c conda-forge pytorch_geometric
Compilation with python setup.py build_ext --inplace seems to work but python generate.py --config results/ABC_10k_FKAConv_InterpAttentionKHeadsNet_None/config.yaml --dataset_name DATASET_NAME --dataset_root data/3d_shapes_abc/abc/ --gen_resolution_global 256 results in OSError: /home/perler/miniconda3/envs/poco/lib/python3.7/site-packages/torch_sparse/_version.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Installed versions are:
(poco) perler@BOTTLE:~/repos/poco$ conda list pytorch
# packages in environment at /home/perler/miniconda3/envs/poco:
#
# Name                    Version                   Build  Channel
pytorch                   1.8.1           py3.7_cuda11.1_cudnn8.0.5_0    pytorch
pytorch-cpu               1.1.0               py3.7_cpu_0    pytorch
pytorch_geometric         2.0.3              pyh6c4a22f_0    conda-forge
pytorch_sparse            0.6.4            py37hcae2be3_0    conda-forge
Again, the CPU-version... but that's a different issue.

AFAIK, they added sparse tensors only recently to Pytorch, so the installed Pytorch-geometric version might be too new. Which version of Pytorch-geometric do I need?

Can you please create a requiremtents.txt and/or environment.yaml?

I copied your steps but get ninja build issue at python setup.py build_ext --inplace, which is the same when following the steps in readme.

--update: the error above was fixed by modifying ninja build part in lib/python3.6/site-packages/torch/utils/cpp_extension.py.

aboulch commented 2 years ago

Hello,

here are the versions installed in my conda environnement:

# Name                    Version                        Build    Channel
python                    3.7.10          hf930737_104_cpython    conda-forge
cudatoolkit               11.1.1                   h6406543_10    conda-forge
openssl                    3.0.2                    h166bdaf_1    conda-forge
pip                       20.2.4                        py37_0    anaconda
wheel                     0.35.1                          py_0    anaconda

And versions installed using the pip requirements file

cython                    0.29.28                  pypi_0    pypi
numpy                     1.21.5                   pypi_0    pypi
open3d                    0.13.0                   pypi_0    pypi
pandas                    1.3.5                    pypi_0    pypi
plyfile                   0.7.4                    pypi_0    pypi
pykdtree                  1.3.4                    pypi_0    pypi
scikit-image              0.19.2                   pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
tensorboard               2.8.0                    pypi_0    pypi
torch                     1.8.1+cu111              pypi_0    pypi
torch-cluster             1.5.9                    pypi_0    pypi
torch-geometric           2.0.4                    pypi_0    pypi
torch-scatter             2.0.8                    pypi_0    pypi
torch-sparse              0.6.12                   pypi_0    pypi
torch-spline-conv         1.2.1                    pypi_0    pypi
torchaudio                0.8.1                    pypi_0    pypi
torchvision               0.9.1+cu111              pypi_0    pypi
tqdm                      4.64.0                   pypi_0    pypi
trimesh                   3.10.7                   pypi_0    pypi

ErlerPhilipp commented 2 years ago

@aboulch Thanks for the update. However, with pip install -r requirements.txt, I get this:

ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

jupyter-packaging 0.12.0 requires setuptools>=60.2.0, but you'll have setuptools 50.3.0.post20201006 which is incompatible.
open3d 0.13.0 requires wheel>=0.36.0, but you'll have wheel 0.35.1 which is incompatible.

Pip still installed the packages.

When I run the generate script, I get a more serious error:

(poco) perler@BOTTLE:~/repos/poco$ python generate.py --config results/ABC_10k_FKAConv_InterpAttentionKHeadsNet_None/config.yaml --dataset_name ABCTest --dataset_root data/3d_shapes_abc/ --gen_resolution_global 128
Traceback (most recent call last):
  File "generate.py", line 11, in <module>
    import torch_geometric.transforms as T
  File "/home/perler/miniconda3/envs/poco/lib/python3.7/site-packages/torch_geometric/__init__.py", line 4, in <module>
    import torch_geometric.data
  File "/home/perler/miniconda3/envs/poco/lib/python3.7/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
    from .data import Data
  File "/home/perler/miniconda3/envs/poco/lib/python3.7/site-packages/torch_geometric/data/data.py", line 9, in <module>
    from torch_sparse import SparseTensor
  File "/home/perler/miniconda3/envs/poco/lib/python3.7/site-packages/torch_sparse/__init__.py", line 16, in <module>
    f'{library}_{suffix}', [osp.dirname(__file__)]).origin)
  File "/home/perler/miniconda3/envs/poco/lib/python3.7/site-packages/torch/_ops.py", line 104, in load_library
    ctypes.CDLL(path)
  File "/home/perler/miniconda3/envs/poco/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.11: cannot open shared object file: No such file or directory

Looks like this issue: https://github.com/pyg-team/pytorch_geometric/issues/2040 Pip installed torch-geometric 2.0.4 pypi_0 pypi

export PATH="~/miniconda3/envs/poco/lib/:$PATH" doesn't help although the libcusparse.so.11 is there.

Any ideas?

aboulch commented 2 years ago

I do not really have an answer on this one.

Before creating the conda environment is cuda 11.1 installed on the machine? On my side I start with docker image: nvidia/cuda:11.1-devel-ubuntu18.04 If you are using a different initial cuda, you may want to change the cuda versions in the environment.yaml and requirements.txt

ErlerPhilipp commented 2 years ago

@aboulch Thanks again! For some reason, neither my WSL 2 nor native Ubuntu worked but the docker image does. I guess it's some strange CUDA / Pytorch version mismatch. Looks like CUDA 11.1 is not recommended for Pytorch 1.8.1 although this combination exists for Pip (but not conda).

Anyway, the generate script runs now at 10-15s per ABC object. If there are no further problems, I'll close the issue soon.

ErlerPhilipp commented 2 years ago

Seems to work (didn't try training yet). Thanks!

valeoai / POCO

Package Versions #3