Notes regarding setup - Githubissues

Given the rate at which python modules are updated, I understand it's difficult to keep things "current". I recently created a new google compute engine with gpu, A100 40GB w/1 gpu, machine type a2-highgpu-1g (12vCPU, 85GB) I was prompted that the OS config did not support gpu, and asked if I wanted to switch to a supported version. Answering yes, I ended up with what I presume is the "default" config. That was a debian 10 version. Subsequent testing revealed that the debian 10 environment has a ptxas version of 11.0.221, which is

 "older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors."

It appeared necessary to select an OS version other than the default, in this case "Debian 11 based Deep Learning VM with M109 and CUDA 11.3" to get an appropriate environment. If this is now a requirement, it would be good to note that and suggest running "ptxas --version" to check. It should look something like this:

$ ptxas --version
  built Mon_May__3_19:14:31_PDT_2021
  release 11.3, V11.3.109
  cuda_11.3.r11.3/compiler.29920130_0

Subsequently, the supplied "speed_ppi.yml" created a conda environment as follows:

  python          3.9.17
  jax             0.3.25
  ml-collections  0.1.1
  dm-haiku        0.0.9
  pandas          1.4.4
  biopython       1.79
  chex            0.0.7
  dm-tree         0.1.8     (>=0.1.6)
  immutabledict   3.0.0     (>=2.0.0)
  numpy           1.24.3    (>=1.19.3)
  scipy           1.11.1    (>=1.9.0)
  tensorflow-cpu  2.13.0    (>= 2.12.0)

This resulted in errors when running some_vs_some:

AttributeError: module 'numpy' has no attribute 'int' ...
  The aliases was originally deprecated in NumPy 1.20

Using trial-and-error and seat-of-the pants guessing, I tweaked speed_ppi.yml to create the following environment which seemed to work:

  python          3.9.17
  jax             0.3.25
  ml-collections  0.1.1
  dm-haiku        0.0.9
  pandas          1.4.4
  biopython       1.79
  chex            0.0.7
  dm-tree         0.1.8     (>=0.1.6)
  immutabledict   2.0.0     (==2.0.0)
  numpy           1.22.4    (==1.22.4)
  scipy           1.11.1    (>=1.9.0)
  tensorflow-cpu  2.12.0    (== 2.12.0)

It appears that at least for some modules, ">=" allows incompatible combinations. numpy in particular; possibly immutabledict and tensorflow-cpu. (I did not try all combinations of numpy, immutabledict and tensorflow-cpu; I stopped when I found the above combination that worked.)

patrickbryant1 / SpeedPPI

Notes regarding setup #10