Generate C++ entry points

dcrc2 commented 3 years ago

~Draft PR: this would currently break the embedded C++ examples added in #877. I'd prefer to merge #877 first and do the fixes here, unless there is still much more work to do on #877.~ Now merged with #877.

The purpose of this PR is to reimplement the functionality currently provided by the function with_ks_allocator. That is, for each ks function that we wish to provide python bindings for, we produce a second C++ function which wraps it. This second C++ function is the one that actually gets used by the pybind module. Previously the wrapper function was defined using C++ template trickery in with_ks_allocator; this PR instead generates code for the wrapper in python.

Example code produced for relu3:

#include "knossos-entry-points.h"

namespace ks {
namespace entry_points {
namespace generated {

ks::tensor<1, double> entry(ks::tensor<1, double> arg0) {
    if (g_logging) {
        std::cerr << "vrelu3$aT1f(" << arg0 << ") =" << std::endl;
        auto ret = ks::vrelu3$aT1f(&g_alloc, arg0);
        std::cerr << ret << std::endl;
        return ret;
    } else {
        return ks::vrelu3$aT1f(&g_alloc, arg0);
    }
}

ks::tensor<1, double> entry_vjp(ks::tensor<1, double> arg0, ks::tensor<1, double> arg1) {
    if (g_logging) {
        std::cerr << "sufrev$vrelu3$aT1f(" << arg0 << ", "  << arg1 << ") =" << std::endl;
        auto ret = ks::sufrev$vrelu3$aT1f(&g_alloc, arg0, arg1);
        std::cerr << ret << std::endl;
        return ret;
    } else {
        return ks::sufrev$vrelu3$aT1f(&g_alloc, arg0, arg1);
    }
}

}
}
}

#include "knossos-pybind.h"

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
    m.def("reset_allocator", &ks::entry_points::reset_allocator);
    m.def("allocator_top", &ks::entry_points::allocator_top);
    m.def("allocator_peak", &ks::entry_points::allocator_peak);
    m.def("logging", &ks::entry_points::logging);

    declare_tensor_1<double>(m, "Tensor_1_Float");
    declare_tensor_2<double>(m, "Tensor_2_Float");
    declare_tensor_2<int>(m, "Tensor_2_Integer");

        m.def("entry", &ks::entry_points::generated::entry);

        m.def("entry_vjp", &ks::entry_points::generated::entry_vjp);

}

#include "knossos-entry-points.cpp"

dcrc2 commented 3 years ago

(The aim is to switch these generated wrappers to use torch::tensor rather than ks::tensor, but I would prefer to make that a separate change.)

dcrc2 commented 3 years ago

(I've rebased this onto master in order to use ks::Float from #924)

dcrc2 commented 3 years ago

In order to merge with #877, I've included the following code in each of the embedded C++ examples:

ks::tensor<1, ks::Float> entry(ks::tensor<1, ks::Float> t) {
    return ks::vrelu3(&ks::entry_points::g_alloc, t);
}
ks::tensor<1, ks::Float> entry_vjp(ks::tensor<1, ks::Float> t, ks::tensor<1, ks::Float> dret) {
    return ks::sufrev_vrelu3(&ks::entry_points::g_alloc, t, dret);
}

It's not really possible to generate this code automatically (replicating what this PR does when compiling a ks file), because it's hard to generate the code without knowing the function signatures. But if we want to reproduce what these wrapper functions should look like, it's always possible to run one of the ks benchmarks, and then copy the wrapper functions from the generated C++ file.

dcrc2 commented 3 years ago

(Rebased onto #944 and #947)

dcrc2 commented 3 years ago

This seems fine. Generating the wrappers ourselves seems to afford more flexibility than doing so with C++ template trickery.

On the other hand there's a bunch of code that duplicates Cgen (see the N.B. comment). An alternative would be to have Cgen output a ks::without_allocator::... function for every function in the module. For example, if my .ks file defines f : Integer -> Integer then as well as Cgen generating ks::Integer ks::f(allocator *, ks::Integer) it could also generate ks::Integer ks::without_allocator::f(ks::Integer). That way we don't need any special logic on the Python side.

What do you think?

As things stand in this PR, I think you're right that this would fit better in ksc.

I'm not sure whether this will remain true when we add more functionality to the wrappers. The next PR (#931, currently draft) changes the wrappers to take torch::Tensor arguments, handling the conversion between torch types and ks types. So if this was part of ksc, it would mean ksc taking on some of the responsibility of knowing how we will call into ks code from PyTorch. The declaration of the pybind11 module is generated in python code, so this would create some overlap in responsibilities between ksc and python code.

If we wanted ksc to handle all of the binding code (including the declaration of the pybind11 module), that seems more consistent to me. But then we're probably committing to having ksc know how to generate bindings for all of the languages that we support (PyTorch, pure Python, and others that might be added in future).

I'd suggest we wait to see what these wrappers look like in their final version, as there are plenty of issues left to resolve. I don't think the amount of duplication with ksc is going to grow much further.

toelli-msft commented 3 years ago

I'd suggest we wait to see what these wrappers look like in their final version, as there are plenty of issues left to resolve. I don't think the amount of duplication with ksc is going to grow much further.

Makes sense, thanks!

microsoft / knossos-ksc

Generate C++ entry points #925