kokkos / pykokkos

Performance portable parallel programming in Python.
103 stars 19 forks source link

Native conda installation #42

Open ma-sadeghi opened 2 years ago

ma-sadeghi commented 2 years ago

I wonder if you could publish pykokkos on conda-forge. Since conda offers a cross-language dependency support, publishing it on conda-forge makes it super easy for users to install it without worrying about (how to) its dependencies, most important of which is kokkos itself.

petsc4py (which I assume you're probably aware of) is an example of a Python package with external C dependencies that has been successfully published on conda-forge. Here's a link to its conda-forge feedstock in case you need a template.

Anyway, would greatly appreciate it if you could publish it on conda-forge.

tylerjereddy commented 2 years ago

I think @namehta4 did some early experimenting here: https://anaconda.org/neilmehta87/pykokkos

I'm sure feedback on the initial attempts would be appreciated.

ma-sadeghi commented 2 years ago

Great, thank you for pointing that out. I just installed pykokkos using mamba install -c neilmehta87 pykokkos in a fresh virtualenv, then tried importing pykokkos, but I got ModuleNotFoundError:

>>> import pykokkos as pk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pykokkos'
tylerjereddy commented 2 years ago

@ma-sadeghi Just confirming that I can reproduce the issue with a slightly different workflow as well:

Perhaps you can live with source installs for a while until we have more mature packaging options here--we'll probably want the usual feedstock repo as you suggest if we don't have one yet.

namehta4 commented 2 years ago

I think this may be somewhat my fault. Could I please request you try either mamba install -c neilmehta87 pykokkos-cori (if using on NVIDIA) or mamba install -c neilmehta87 pykokkos-hip (if using on AMD GPU) Thanks!

On Tue, Jul 5, 2022 at 12:13 PM Tyler Reddy @.***> wrote:

@ma-sadeghi https://github.com/ma-sadeghi Just confirming that I can reproduce the issue with a slightly different workflow as well:

  • conda create -n pykokkos_test python=3.9
  • conda activate pykokkos_test
  • mamba install -c neilmehta87 pykokkos
  • now mamba list shows pykokkos in the current environment but it cannot be imported

Perhaps you can live with source installs for a while until we have more mature packaging options here--we'll probably want the usual feedstock repo as you suggest if we don't have one yet.

— Reply to this email directly, view it on GitHub https://github.com/kokkos/pykokkos/issues/42#issuecomment-1175408731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPG2XTIBVCKWMUL2RILC7DVSSCOTANCNFSM52OF2C6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Thanks and regards, Neil Mehta Performance Engineer, NERSC Lawrence Berkeley National Laboratory

tylerjereddy commented 2 years ago

And for OpenMP?

namehta4 commented 2 years ago

I havent yet built a package for openmp. I will get to it this week and ping back

On Tue, Jul 5, 2022 at 1:42 PM Tyler Reddy @.***> wrote:

And for OpenMP?

— Reply to this email directly, view it on GitHub https://github.com/kokkos/pykokkos/issues/42#issuecomment-1175479265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPG2XXAFJLHVAK3NSLHPM3VSSM4FANCNFSM52OF2C6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Thanks and regards, Neil Mehta Performance Engineer, NERSC Lawrence Berkeley National Laboratory

ma-sadeghi commented 2 years ago

Hi @namehta4, I tried again using the method you suggested:

>>> import pykokkos
Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability 7.0 on device with compute capability 7.5 , this will likely reduce potential performance.

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_LAUNCH_BLOCKING=1.
                                  The code must call Cuda().fence() after each kernel
                                  or will likely crash when accessing data on the host.

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or
                                  setting CUDA_VISIBLE_DEVICES.
                                  This could on multi GPU systems lead to severe performance"
                                  penalties.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos/__init__.py", line 17, in <module>
    defaults: Optional[CompilationDefaults] = runtime_singleton.runtime.compiler.read_defaults()
  File "/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos/core/compiler.py", line 260, in read_defaults
    main: Path = self.get_main_path()
  File "/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos/core/compiler.py", line 162, in get_main_path
    return Path(self.console_main)
AttributeError: 'Compiler' object has no attribute 'console_main'
NaderAlAwar commented 2 years ago

@ma-sadeghi this is an issue that occurs when you import pykokkos from the python console. Could you try running one of the examples in the repo? Something like

cd examples/kokkos-tutorials/functor
python 02.py --fill
ma-sadeghi commented 2 years ago

@NaderAlAwar Here's the traceback when running from terminal:

(kokkos) pykokkos$ python examples/kokkos-tutorials/functor/02.py --fill
Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability 7.0 on device with compute capability 7.5 , this will likely reduce potential performance.

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_LAUNCH_BLOCKING=1.
                                  The code must call Cuda().fence() after each kernel
                                  or will likely crash when accessing data on the host.

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or
                                  setting CUDA_VISIBLE_DEVICES.
                                  This could on multi GPU systems lead to severe performance"
                                  penalties.
  Total size S = 262144 N = 256 M = 1024 E = 1024
Total size S = 262144 N = 256 M = 1024
INFO:root:translation 0.001797311007976532
/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos_base-0.0.7-py3.10-linux-x86_64.egg/lib64/../bin/nvcc_wrapper: line 562: nvcc: command not found
/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos_base-0.0.7-py3.10-linux-x86_64.egg/lib64/../bin/nvcc_wrapper: line 562: nvcc: command not found

C++ compilation in pk_cpp/home/anaconda/Code/pykokkos/examples/kokkos-tutorials/functor/02/02_Workload/OpenMP failed
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos allocation "" is being deallocated after Kokkos::finalize was called

Traceback functionality not available

Aborted (core dumped)
NaderAlAwar commented 2 years ago

The first warning is from Kokkos and is probably due to the pykokkos-base install being compiled with a different compute capability. If you do observe a reduction in performance, you can get rid of it by compiling pykokkos-base locally, let me know if you need help with that.

The other two warnings are also from Kokkos, you can get rid of them by setting the environment variables in the warning message.

The actual error is due to nvcc not being found in your PATH. Is it installed on your system?

NaderAlAwar commented 2 years ago

@ma-sadeghi were you able to resolve the error?

ma-sadeghi commented 2 years ago

I had to install cudatoolkit-dev to make nvcc available to my conda environment. Here's the new traceback:

Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability 7.0 on device with compute capability 7.5 , this will likely reduce potential performance.

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_LAUNCH_BLOCKING=1.
                                  The code must call Cuda().fence() after each kernel
                                  or will likely crash when accessing data on the host.

Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
                                  without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or
                                  setting CUDA_VISIBLE_DEVICES.
                                  This could on multi GPU systems lead to severe performance"
                                  penalties.
  Total size S = 262144 N = 256 M = 1024 E = 1024
Total size S = 262144 N = 256 M = 1024
INFO:root:translation 0.0025273360079154372
INFO:root:compilation 11.882021105004242
Computed result for 256 x 1024 is 262144.0
N(256) M(1024) nrepeat(100) problem(MB) time(12.026733778009657) bandwidth(GB/s)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos allocation "" is being deallocated after Kokkos::finalize was called

Traceback functionality not available

Aborted (core dumped)
namehta4 commented 2 years ago

Hi Amin,

I don't quite understand why but this is the expected output. Please try to run the 02.py example from the tutorial directory to verify if the installation was successful. Thanks!

On Tue, Jul 5, 2022, 19:02 Amin Sadeghi @.***> wrote:

Hi @namehta4 https://github.com/namehta4, I tried again using the method you suggested:

import pykokkosKokkos::Cuda::initialize WARNING: running kernels compiled for compute capability 7.0 on device with compute capability 7.5 , this will likely reduce potential performance. Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default without setting CUDA_LAUNCH_BLOCKING=1. The code must call Cuda().fence() after each kernel or will likely crash when accessing data on the host. Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or setting CUDA_VISIBLE_DEVICES. This could on multi GPU systems lead to severe performance" penalties.Traceback (most recent call last): File "", line 1, in File "/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos/init.py", line 17, in defaults: Optional[CompilationDefaults] = runtime_singleton.runtime.compiler.read_defaults() File "/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos/core/compiler.py", line 260, in read_defaults main: Path = self.get_main_path() File "/home/anaconda/mambaforge/envs/kokkos/lib/python3.10/site-packages/pykokkos/core/compiler.py", line 162, in get_main_path return Path(self.console_main)AttributeError: 'Compiler' object has no attribute 'console_main'

— Reply to this email directly, view it on GitHub https://github.com/kokkos/pykokkos/issues/42#issuecomment-1175693107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPG2XSYWY4FE4T4WGOTRXDVSTSKPANCNFSM52OF2C6Q . You are receiving this because you were mentioned.Message ID: @.***>

ma-sadeghi commented 2 years ago

Thank you for your reply. I'm swamped at the moment, but I'll report back when I get the chance.