ledatelescope / bifrost

A stream processing framework for high-throughput applications.
BSD 3-Clause "New" or "Revised" License
64 stars 29 forks source link

Python control of CUDA streams #167

Closed jaycedowell closed 1 year ago

jaycedowell commented 2 years ago

This PR exposes the bfStreamGet and bfStreamSet methods through the Python API. This can help with interfacing Bifrost with other CUDA-aware packages, such as cupy. For example:

import bifrost as bf
from bifrost.device import get_stream
import cupy as cp
import numpy as np

data = np.random.rand(100,1000)
data = data.astype(np.float32)
orig_data = data

stream = get_stream()
with cp.cuda.ExternalStream(stream):
    data = bf.ndarray(data, space='cuda')
    bf.map('a = a + 2', {'a': data})
    data = data.as_cupy()
    data *= 4
    data = cp.asnumpy(data)
print(data[0,:10])
print(orig_data[0,:10])
np.testing.assert_allclose(data, (orig_data+2)*4)

This PR also adds set_stream and get_stream methods to BifrostObjects that support them.

codecov-commenter commented 2 years ago

Codecov Report

Merging #167 (96d99ac) into master (f82ae97) will increase coverage by 0.24%. The diff coverage is 68.75%.

@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
+ Coverage   66.76%   67.00%   +0.24%     
==========================================
  Files          69       69              
  Lines        7416     7464      +48     
==========================================
+ Hits         4951     5001      +50     
+ Misses       2465     2463       -2     
Impacted Files Coverage Δ
python/bifrost/libbifrost.py 71.11% <23.07%> (-8.11%) :arrow_down:
python/bifrost/device.py 81.63% <85.71%> (+10.20%) :arrow_up:
python/bifrost/ndarray.py 89.60% <0.00%> (+6.09%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f82ae97...96d99ac. Read the comment docs.

league commented 2 years ago

Interesting. Is it worthwhile to code something like the cupy example shown above as a test? Not that we need to add cupy as a test dependency… maybe can mock something. Just so as not to cede the coverage score. :) I'll play a little.

jaycedowell commented 2 years ago

I was thinking about adding in some cupy tests yesterday. It would be good from a coverage perspective to be able to test that we can make biforst.ndarrays from cupy.ndarrays and vice versa. This bifrost/cupy interoperability would also be an interesting test case since it would also serve as an example of how to do this. The biggest thing that I see with the interoperability test is checking that the stream binding actually works. I was testing with nvprof/nvvp yesterday but that is way too heavy to put in. The other thing to keep in mind is that our coverage should improve once we have the dedicated test machine and we can run (and report) all of the CUDA-specific tests and paths.

jaycedowell commented 2 years ago

Sort of related to this, if you look in bifrost.ndarray we also have support for pycuda in there. I used to use pycuda but I've really shifted to cupy recently. It might be worth thinking about how important it is to provide interoperability with both or if one is better/easier/sufficient.

jaycedowell commented 2 years ago

I did some reading on pycuda vs cupy today and I think the answer to my previous interoperability question is "yes". They are targeted to two different audiences with cupy being an easy way to move onto the GPU if you have a lot of numpy-based codes and pycuda is more about building things from scratch in a very CUDA way. In that regard how you can interface pycuda and Bifrost is limited to being able to move data between the array formats since I can't find an interface in pycuda that allows you to use an external thread. cupy, on the other hand, does have that feature and I use it in the example.

league commented 2 years ago

Assuming the tests pass, this all looks great to me. (I can probably try it later today.) And it makes a great case for interoperability with other frameworks. Will maybe want to tout that in release notes and documentation next time.

Could it be worthwhile to put BifrostStreamManager into the library, rather than just the test file? You're providing get_stream and set_stream which can be thought of as a lower-level interface, but in most cases of Python programming it seems like calling them within a context manager is The Right Thing.

league commented 2 years ago

(Maybe if it's in the library, BifrostStreamManager wants a more precise name? Stream substituter? Stream replacer? StreamRedirect?)

jaycedowell commented 2 years ago

orig_stream and BifrostStreamManager were reactions to problems I was having in test_stream. When another framework creates a stream, Bifrost can use it but we are subject to whatever lifetime was chosen for that stream. When I wasn't being careful I would not save the original Bifrost stream and running this test would make all other tests after it fail because the stream was gone.

Maybe the thing to do is use the terminology of cupy and call it something like ExternalStream or UseExternalStream?

jaycedowell commented 2 years ago

I'm holding off on this to see if I can get a self-hosted runner going on the test machine. It sort of works but there are many mysteries.

jaycedowell commented 2 years ago

Those really slow builds on self-hosted/Python3.[68] are all for this.

league commented 2 years ago

Those really slow builds on self-hosted/Python3.[68] are all for this.

Yeah I noticed that. It's just the build, not especially the tests that use it, right? Possible/desirable to pip-install it into the docker container ahead of time? Or is the runner python a totally different environment from the docker-image python?

jaycedowell commented 2 years ago

I think it's the PyCUDA/CuPy install (mainly CuPy) that is the problem. I don't know that there is a way to pre-install it since we are using actions/setup-python@v2 to set the Python environment that we will use for the build.

jaycedowell commented 2 years ago

One thing we could do is run two self-hosted runner at once, one per GPU.

league commented 2 years ago

Also maybe it's not necessary to include the CuPy tests for both 3.6 and 3.8?

jaycedowell commented 2 years ago

The cupy-cuda112 wheel is helping. It dropped the "Software Install - Python, part 2" time down to ~1 min from ~12 min. If we keep using the wheel then we need to make sure we are using the correct one for whatever version of cuda we have in the Dockerfile.

jaycedowell commented 2 years ago

We should be running two runners now, one on each GPU. I should test that.

jaycedowell commented 2 years ago

Two runners gets it down to ~30 minutes. Another way to save time would be to drop the Py2.7 self-hosted test. That would mean giving up Py2.7 testing for the GPU related code.

jaycedowell commented 2 years ago

I think I'm happy with that is in here now that there is the "per-thread default stream" flag in configure. The only thing I'm tempted to continue working on is this seemly random test_matmul_aa_ci8 failure that we sometimes get. The full error is:

ERROR: test_matmul_aa_ci8 (test_linalg.TestLinAlg)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/docker/actions-runner/_work/bifrost/bifrost/test/test_linalg.py", line 302, in test_matmul_aa_ci8
    self.run_test_matmul_aa_ci8_transpose(True)
  File "/home/docker/actions-runner/_work/bifrost/bifrost/test/test_linalg.py", line [29](https://github.com/ledatelescope/bifrost/runs/6833550337?check_suite_focus=true#step:9:30)2, in run_test_matmul_aa_ci8_transpose
    self.run_test_matmul_aa_ci8_shape((99+kp,3+kp),          transpose=transpose)
  File "/home/docker/actions-runner/_work/bifrost/bifrost/test/test_linalg.py", line 72, in run_test_matmul_aa_ci8_shape
    self.linalg.matmul(1, a, None, 0, c)
  File "/home/docker/actions-runner/_work/_tool/Python/2.7.18/x64/lib/python2.7/site-packages/bifrost/linalg.py", line 66, in matmul
    c_array))
  File "/home/docker/actions-runner/_work/_tool/Python/2.7.18/x64/lib/python2.7/site-packages/bifrost/libbifrost.py", line 139, in _check
    raise RuntimeError(status_str)
RuntimeError: BF_STATUS_UNSUPPORTED_SHAPE
jaycedowell commented 2 years ago

I haven't been able to reproduce that failure. Oh well.

jaycedowell commented 1 year ago

@league do you think this one is ready to merge? I can't think of anything else I'd like to see as part of it.

league commented 1 year ago

With bbd4b67, I thought it might be useful for the config summary to show what happened with the stream-model setting, so it's not just buried in the scroll.

I thought about adding it to python -m bifrost.version --config too, but for the moment it doesn't seem that it's directly passed into Python. (It's not in config.h). But maybe that's fine.

Trying to test on cetus, I discovered another thing: permission denied errors on /dev/shm/bifrost because it's owned by you. The top level of it is globally writable so it may work for my process just to create new stuff underneath, but it gets stuck on the permissions before trying that, I think. I wonder if we should use a UUID under there rather than the same global name for every user?

This happens on every CUDA test:

test_serialize_with_name_no_ringlets (test_serialize.SerializeTest) ... proclog.cpp:176 internal error:
 filesystem error: cannot set permissions: Operation not permitted [/dev/shm/bifrost]                  
ERROR
▶ ls -ld /dev/shm/bifrost
drwxrwxrwx 3 jdowell jdowell 60 Aug  3 09:59 /dev/shm/bifrost/
jaycedowell commented 1 year ago

This problem seems familiar (as in something I've fixed in the past) but looking back through the history the only thing I see is f2691ff. That's a different problem of cleaning up stale contents but has similar "stuff from other users" roots as this. I'm kind of surprised that it is even trying to change the permissions since world writable is what we want. Maybe we need to only try to make /dev/shm/bifrost world writable if it is not already.

The UUID thing is interesting. What are you thinking with this? That there would be a per-user /dev/shm/bifrost-XXXXXX directory?

league commented 1 year ago

Regarding /dev/shm/bifrost-UUID, yeah I think so. At first I was thinking we could even reuse the telemetry ID if it's always loaded anyway. But I think any routines to load the telemetry state file are from the Python side, and this might be initiated from C++ side.

What's the number right beneath /dev/shm/bifrost, is it a PID? What's left there now is 1329696. If that's a PID and it's meant to be cleaned up (except when it isn't) then probably best to just use /dev/shm/bifrost world-writable (and don't choke on chmod if it's already world-writable but owned by someone else) but beneath that continue to use user-owned PID subdirs. ?

league commented 1 year ago

Ooh! Exciting security-hole opportunity, haha. proclog.cpp uses make_dir from fileutils.cpp so I reread my comment near the top of that file.

/* NOTE: For convenience on systems with compilers without C++17 support, these
   functions build a shell command and pass it to system(). The PATH argument is
   not shell-quoted or otherwise sanitized, so only use with program constants,
   not with data from command line or config files. */

Combine that with the fact that I was just proposing pulling an identifier out of a config file and use it in the filename passed to make_dir. 🧨 🔥

jaycedowell commented 1 year ago

Yeah, that number is the PID. Having everything in one place makes it easier to at least attempt to cleanup after a process died before it could do so itself. I guess that would be true of /dev/shm/bifrost-UUID but you are limited to only cleaning up your own dead processes.

league commented 1 year ago

Looks like the permission denied issue is limited to HAVE_CXX_FILESYSTEM. On the command line, this succeeds:

▶ mkdir -p -m 777 /dev/shm/bifrost

but std::filesystem::permissions throws an exception.

league commented 1 year ago

Trying this, which is probably closer to what mkdir -p -m does anyway.

  bool created = std::filesystem::create_directories(path);
  if(created) {
    std::filesystem::permissions(path, (std::filesystem::perms) perms,
                                 std::filesystem::perm_options::replace);
  }
league commented 1 year ago

Using sudo to nuke your /dev/shm/bifrost so I can make sure the fix creates it correctly for me.