zhanglab-aim / cancer-net

Diagnosing cancers using deep learning.
GNU General Public License v2.0
2 stars 0 forks source link

Changed pandas and torch version in environment file #20

Closed Chris-Pedersen closed 1 year ago

Chris-Pedersen commented 1 year ago

A few issues I was having with the old environment file:

  1. Pandas was having issues when generating new datasets with a different edge_tol:
    
    File ~/miniconda3/envs/cancernet/lib/python3.9/site-packages/pandas/core/frame.py:639, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    637     raise ValueError("index cannot be a set")
    638 if columns is not None and isinstance(columns, set):
    --> 639     raise ValueError("columns cannot be a set")
    641 if copy is None:
    642     if isinstance(data, dict):
    643         # retain pre-GH#38939 default behavior

ValueError: columns cannot be a set


which was fixed by setting it to version 1.4.1
2. Torch and cuda were incompatible, and I was having issues on rusty using gpus when using fresh environments from the .yml file:

(base) [cpedersen@worker1046 cancer-net]$ conda env create -f environment.yml ... (base) [cpedersen@worker1046 cancer-net]$ conda activate cancerenv (cancerenv) [cpedersen@worker1046 cancer-net]$ python3 Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.cuda.is_available() False


3. Tensorflow (needed to run the old pnet) also had some conflicts with the torch and cuda versions

Have fixed all these with the new yml file and tested in fresh environments. NB that even without the tensorflow consideration, the old environment wasn't working on gpu for me on a fresh install, so I suggest with just use this for now, and drop tensorflow once we have fully validated our torch pnet implementation.

ttesileanu commented 1 year ago
  1. Can you show the full trace for the error? Or let me know where you see the error / provide a minimal example that shows it. At first glance this seems like just new behavior for the DataFrame constructor that might be fixed by converting the set of columns into a list before passing it as an argument. If that's the case, we should do that and keep using the newest version of pandas.

  2. Was this on a GPU node? I tried my environment on a GPU node and it seemed to find cuda. (torch.__version__ is '1.11.0', cudatoolkit version is 11.3.1)

Chris-Pedersen commented 1 year ago
  1. Sure:
    >>> prostate_root = os.path.join("/mnt/home/cpedersen/ceph/Data/data", "prostate")
    >>> dataset = PnetDataSet(
    ...     root=prostate_root,
    ...     name="prostate_graph_humanbase",
    ...     edge_tol=0.49,
    ...     pre_transform=T.Compose(
    ...         [T.GCNNorm(add_self_loops=False), T.ToSparseTensor(remove_edge_index=False)]
    ...     ),
    ... )
    Processing...
    loading gene graph took 141.14 seconds.
    WARNING:root:response in cached_data is being set by '/mnt/home/cpedersen/ceph/Data/data/prostate/response_paper.csv'
    WARNING:root:some genes don't exist in the original data set
    /mnt/home/cpedersen/Codes/cancer-net/cancernet/dataset/pnet_dataset.py:402: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
    x = x.loc[:, intersect]
    WARNING:root:some genes don't exist in the original data set
    /mnt/home/cpedersen/Codes/cancer-net/cancernet/dataset/pnet_dataset.py:402: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
    x = x.loc[:, intersect]
    WARNING:root:some genes don't exist in the original data set
    /mnt/home/cpedersen/Codes/cancer-net/cancernet/dataset/pnet_dataset.py:402: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
    x = x.loc[:, intersect]
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/mnt/home/cpedersen/Codes/cancer-net/cancernet/dataset/pnet_dataset.py", line 45, in __init__
    super().__init__(root, transform, pre_transform)
    File "/mnt/home/cpedersen/miniconda3/envs/cancerenv_main/lib/python3.10/site-packages/torch_geometric/data/in_memory_dataset.py", line 50, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
    File "/mnt/home/cpedersen/miniconda3/envs/cancerenv_main/lib/python3.10/site-packages/torch_geometric/data/dataset.py", line 87, in __init__
    self._process()
    File "/mnt/home/cpedersen/miniconda3/envs/cancerenv_main/lib/python3.10/site-packages/torch_geometric/data/dataset.py", line 170, in _process
    self.process()
    File "/mnt/home/cpedersen/Codes/cancer-net/cancernet/dataset/pnet_dataset.py", line 62, in process
    all_data, response, edge_dict = data_reader(filename_dict=self.raw_file_names)
    File "/mnt/home/cpedersen/Codes/cancer-net/cancernet/dataset/pnet_dataset.py", line 267, in data_reader
    res = combine(
    File "/mnt/home/cpedersen/Codes/cancer-net/cancernet/dataset/pnet_dataset.py", line 453, in combine
    df = pd.DataFrame(x, columns=c, index=r)
    File "/mnt/home/cpedersen/miniconda3/envs/cancerenv_main/lib/python3.10/site-packages/pandas/core/frame.py", line 638, in __init__
    raise ValueError("columns cannot be a set")
    ValueError: columns cannot be a set

Came across this: https://github.com/facebook/Ax/issues/1153 which suggested to downgrade, which fixed it

  1. Ooops yes this was on CPU, but I had tested it before on gpu, just ran it again:
    (cancerenv_main) [cpedersen@workergpu094 ~]$ python3
    Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import torch
    >>> torch.cuda.is_available()
    False
    >>> torch.__version__
    '1.12.1'

    My cuda version is:

    (cancerenv_main) [cpedersen@workergpu094 Codes]$ conda list cudatoolkit
    # packages in environment at /mnt/home/cpedersen/miniconda3/envs/cancerenv_main:
    #
    # Name                    Version                   Build  Channel
    cudatoolkit               11.7.0              hd8887f6_10    conda-forge

when running the current environment.yml file on main, so I guess somehow we were getting different torch and cuda versions. Have you tried checking these versions after creating a fresh environment?

ttesileanu commented 1 year ago

Ok, the first issue is fixed by turning a set into a list -- I pushed a fix to main.

For the second issue: I think I see the problem now. The newest version of cudatoolkit seems to have problems with pytorch under 1.13, but pyg for now only supports up to 1.12. So our environment installs pytorch 1.12 and cudatoolkit 11.7, and cuda doesn't work with this combination. So for now indeed the best solution seems to be to force the cudatoolkit version to 11.6.

I made a slightly more restricted set of changes compared to what was included in your PR (only lowered cudatoolkit version and added tensorflow) and pushed to main.