Spatch requirements for reuse in other libraries

I was discussing how the NetworkX dispatching could be generalized to support reuse inside scikit-image for dispatching to cucim with @jookuma, @rlratzel, and @lagru. @jookuma put together a prototype in scikit-image/scikit-image#7466. Some of the thoughts that came up in this discussion (some of this functionality seems to already be supported, others are new, just collecting all of them here for completeness):

The dispatch behavior needs to be customizable on a per-frontend basis. IOW, NetworkX and scikit-image may need different behaviors w.r.t. when fallback must happen, when it should be allowed, and when it should be forbidden and raise an exception.
Dispatching should be inferred based on the type of the input object, e.g. a cupy array should trigger running on the GPU. AFAICT this is how nx works since the graph type automatically determines which backend is used. However, this behavior needs to interact gracefully with the customizability above, i.e. nx might want to automatically run the default nx backend if the nx_cugraph backend does not support an algorithm, whereas scikit-image might want to simply fail if cupy arrays are provided to an algorithm that is not supported by cucim (don't silently copy the input data to the CPU).
There should be a cascade of decision points where users can customize behavior. In order of increasing precedence: global config variable, environment variable, inferred based on input type, backend keyword. We may not need to implement all of them, but those are the ones we discussed so far. IIUC 3/4 of those are already supported by nx to some extent, and a config is potentially planned eventually.
Backend- and algorithm-specific configuration should eventually be supported.

To add a few other aspects that we discussed

There should probably be a mechanism for a backend to declare with which versions of a frontend it's compatible and how tightly this coupling is supposed to be. Fail early or try to dispatch and only fail then?
This is also relevant when trying to incorporate information about the backend in the documentation, like networkx does now. In terms of scikit-image, I'd start with just marking a function as "supports dispatching" and then let backends document their differing behavior in their own docs. This also gives backends more independence from upstream. Down the line, we can definitely explore options to integrate that more conveniently.
It's up to backends how tightly they want to follow the behavior of their targeting API and to document that accordingly. E.g. output order, supported dtypes, ... might differ.
It would be good to mark backend functions that pass the test-suite of their frontend.

Some general principles I'd personally like to see implemented:

Transparency: There must be a way of introspecting the dispatching flow. I'd prefer for this "logging" to be on by default, with the option to suppress it (via env variable e.g.).
Transportability: A snippet of code, executed on different machines / in different environments, should, within practical limits, yield the same numerical results. This responsibility will often fall on backend providers, who'd have to set up CI against skimage.
Reproducibility: A snippet of code, executed in any given environment, should execute the same pipeline consistently. This argues against using global / hidden configuration.
Separation: Preferably, there will be no code in the library that deals with specific details/workarounds for any given backend. I.e., I would not expect the word "cuda" or "cucim" to be mentioned in the skimage code.

There has to be a way to make a backend selection decision that takes into account more than just the types of the arguments. For example when passing a large image as a numpy array it can/is worth transferring it to the GPU, doing the work and transferring the result back. Whether or not doing the conversion is worth it might depend on the particular operation/function/algorithm though.

Similarly, it should be possible to pass "small arguments" as Python lists, instead of having to make them arrays of the same type as "the thing" you want to operate on. This is mostly a UX/convenience thing for interactive use. So you need a way to pick the backend while ignoring those args.

For something like scikit-learn you need to be able to do dispatching for stateful classes. The call to fit might select the backend, and then other methods (that are dispatched) need to use the backend selected during fit.

Another interesting thing is how to make sure that input validation is the same for different backends. In the sense that if you pass some invalid combination of arguments (or arguments + constructor arguments in the case of a class) you should get the same exception for different backends. Having to duplicate the validation code is probably not great, so we need some ideas on how to do this.

You can probably take this one step further and say that no matter what the backend, the possible exceptions and when they are raised should always be the same. Enforce this via testing?!

I'd prefer for this "logging" to be on by default, with the option to suppress it (via env variable e.g.).

I agree with logging, but I think I'd prefer off by default. In my experience users typically ignore loud outputs unless they're requested.

A snippet of code, executed on different machines / in different environments, should, within practical limits, yield the same numerical results

Just want to note that due to associativity issues GPU backends are unlikely to ever produce numerical identical results, but should be fine within some tolerance. More generally, there may be APIs in the frontend library that have historically implicitly provided results with some property (e.g. consistent/stable ordering) that has not actually been promised as part of the API but that users have come to rely on. Dispatching will potentially force frontends to become more explicit about what exactly is being promised about results.

I think most of what @betatim suggested sounds good and is consistent with the other proposals discussed above.

There has to be a way to make a backend selection decision that takes into account more than just the types of the arguments.

To some degree, I think this is the elephant in the room, together with the point of reproducibility. For now, I would be happy to defer the decision of how this happens to scikit-learn, it could be any or all of:

SKIMAGE_PRIORITIZED_BACKENDS=cucim
from cucim import enable_skimage_gpu
with backend("cucim"):
function(..., backend="cucim")
And in principle cucim just activating itself by default. (One could document a promise of being light-weight and passing the full test suite/changing results very little.)

All of the above make sense, what currently bothers me most is that there are two possible "cucim" backend meanings:

One type-dispatching focused for cupy inputs. We can always use that when one of the inputs is a cupy array (because those currently fail anyway).
One that understands numpy inputs as well (and was "activated" as above). This one must return numpy arrays on numpy inputs.

But when you think of function(numpy_array, backend="cucim") both would behave different. The first might error or return a cupy array, while the second would still return a NumPy array. I am not sure how to resolve this. Maybe a single backend can understand both, or split it into a cucim and a cucim[cupy] backend with some layer to make that convenient.

I agree with logging, but I think I'd prefer off by default.

I see logging as a very solvable issue. For example, we could keep counters about which backend was called for which function. So that the issue template can say: Run your code and then give us the information from skimage.backends.gather_callstats().

Verbose print-out on dispatching by default seems strange to me. But I think it can make sense in a scenario where pip install cucim or pip install cucim[autouse] activates the cucim (for numpy array) backend automatically (last bullet above). The user might never realize what is going on there, so printing out:

Auto-activating "cucim" backend for scikit-image, to hide this warning set env variable XY or `import cucim.skimage_backend` before `skimage`.

could make sense to me and should be the choice of scikit-image (even if it is probably hard to enforce that choice).

Separation: Preferably, there will be no code in the library that deals with specific details/workarounds for any given backend.

The only reason I can see for this would be to help users by having a stub-backend that says: Oh, you passed a cupy array, maybe you want to install a backend for that? But you can probably hook that into the array-conversion step failure if wanted.

For now, I would be happy to defer the decision of how this happens to scikit-learn, it could be any or all of:

What do you think of an additional option of "if cucim is installed it is used as a backend"? Maybe at first the user would have to activate it/dispatching in general via something like skimage.set_config(dispatching="cucim"). Or is that covered by one of your options?

In scikit-learn the current thinking is that users will likely only install one additional backend, but maybe more than one. For now users have to opt-in to dispatching. As part of the opting in they can specify their preference for the order in which backends are tried. Each backend is asked "do you want to handle this?". The first one to say "yes" gets it. If none say yes the default backend handles it (aka the code that is currently in scikit-learn).

This way a backend can say yes/no to work based on the input type, input size and algorithm hyper-parameters.

An open question is how to determine the order in which backends are tried in the case where the user opts in to dispatching without specifying one or in the future when it is on by default.

what currently bothers me most is that there are two possible "cucim" backend meanings:

I think it would be confusing for a user if there were two cucim backends. Why do you need that? One backend could handle both cases you described no? But it would have to keep track of the input type so that it can do the "back to numpy" conversion if needed.

But when you think of function(numpy_array, backend="cucim") both would behave different. The first might error or return a cupy array, while the second would still return a NumPy array.

cuCIM follows CuPy's policy of not automatically casting from NumPy arrays. An error will be raised on NumPy inputs to cuCIM functions.

"if cucim is installed it is used as a backend" [...] Or is that covered by one of your options?

That was supposed to be the last point, but not too clear :). I would see it as scikit-image's job to make guidlines, and I think it could go either way based on how much confidence we have in the mechanism (and confidence can grow over time).

Note that I think that these worries are unnecessary when we are dealing e.g. with CuPy inputs that scikit-image currently doesn't support (or at least only in niche things). I do think such a backend should just auto-activate always (via entry points).

This way a backend can say yes/no to work based on the input type, input size and algorithm hyper-parameters.

In an ideal world (but happy to not aim too high if it starts getting in the way). I might make it a two stage process:

Just use types (and maybe hyper params?). Types could be hardcoded into the backend system (by listing types you want to/can handle) or just a "match/can-do" function. That has some potential advantages:
- You can cache which backend is used, speeding up later dispatching.
- If you know the types, you ensure that another backend won't think it should handle your type, no matter the "order" of backends.
  - This may just make it easier to not be sloppy about type matching.
  - When you think about type-dispatching, type information could be used to infer a backend order automatically. I am not sure how important it is, since it might be easier to just hardcode such order (i.e. my "cucim was slow for XY" backend knows it wants to have higher priority than "cucim" if it ever comes to that).
A backend should be allowed to defer even if it matches in the first step:
- Allow deferring for small problem sizes
- Allow deferring when an implementation is incomplete.
- (A recursive approach might work here as well, but deferring seems pretty nice to me and probably even more straight forward.)

I don't think starting with a type-based approach makes implementation significantly harder. The question below remains identical.

cuCIM follows CuPy's policy of not automatically casting from NumPy arrays. An error will be raised on NumPy inputs to cuCIM functions.

I think it would be confusing for a user if there were two cucim backends. Why do you need that? One backend could handle both cases you described no? But it would have to keep track of the input type so that it can do the "back to numpy" conversion if needed.

A backend can keep track of the input type, i.e. I could write a cucim backend that returns NumPy arrays for NumPy array inputs. We may even help cucim a bit to do that with the backend system.

So it is not much of problem to have a cuCIM backend (and maybe that is enough):

Always takes care of cupy inputs and outputs (unless a second backend battles with it).
Can be activated/prioritized to handle numpy inputs and return numpy results. (May or may not be automatically active as well)

But if you for example look at cuGraph if you do func(nx_graph, backend="cugraph") it will return a cugraph object. (This is less problematic in NetworkX, because they duck-type pretty well I think, also not a lot of functions return graphs.)

But the point is, you can't do both, without some user choice. And that choice could look very different. If you go a bit of type-dispatching "all-in", you could even do:

func(in_type, backend="cucim") -> in_type
func.invoke(out_type)(any_type, backend="any backend") -> out_type or error

using a separate mechanism in case you ever care about types rather than backends (the last is what some type dispatching libraries do, IIRC).

scientific-python / spatch

Spatch requirements for reuse in other libraries #1