Improving external C++ integration

WardBrian commented 1 year ago

Introduction

After #1277, the only required use for forward declarations of functions is for using external C++ code. This feature as currently implemented is sub-par for a number of reasons:

We need to globally disable the typechecker pass which verifies that forward declarations all eventually have definitions. (--allow-undefined)
~We generate a C++ declaration for these functions, which means users must use our complicated templates and backward compatibility breaks any time we change our code generation.~
At least in CmdStan, the external C++ is outside the generated model's namespace and so it needs to have knowledge of what the model's namespace is.

Proposed change:

A new syntax, which looks like

extern "myfile.hpp" real foo(real a);

(note, extern is already a reserved word in Stan).

This syntax solves each of the 3 problems above:

The typechecker knows this is for external C++ and not just a normal forward declaration, so it can locally disable that check
Similarly, the backend knows this is for external C++ and can generate no bespoke code of its own. The user can provide any C++ which satisfies the call sites, and this should be much more stable between versions of Stan.
Finally, we can paste the contents of myfile.hpp directly into the generated C++ (more likely we will just do a C++-level #include), so that this code will live inside the namespace.

Considerations

If we ever had a non-C++ backend, one could imagine a extern "foo.py" ... style which does the same thing. Using the filename rather than something like extern "C++" is to allow for item 3 above.
It might be useful to also allow some code before the model namespace, e.g. if the user's C++ needs its own #includes, we may want those outside the namespace. Something like this seems like it could be handled on the command line, rather than in the language.

Previous discussions:

Soliciting opinions @bob-carpenter @mitzimorris @rok-cesnovar @nhuurre

bob-carpenter commented 1 year ago

@WardBrian and I chatted about this in person. My main comment was that I don't like something that changes language behavior being defined as an attribute. That's when Brian proposed the alternative,

extern "myfile.hpp" real foo(real a);

which I can live with.

andrjohns commented 1 year ago

This sounds great to me, simpler external c++ support would be fantastic! The only possible conflict I see is that we end up with two different approaches for external code (c++ vs stan):

extern "myfile.hpp" real foo(real a);

and

#include foo.stan

Which could lead to some user confusion. I don't have any good suggestions/resolutions, but just flagging

nhuurre commented 1 year ago

The user may want to import multiple functions from the same file so I'd suggest extern blocks rather than attaching the filename to each function.

Is there a reason to include user code directly in the model namespace instead of letting the user choose their own namespace and then having using ...; declarations in the model namespace? I think it'd be cleaner to always have the #include at toplevel.

So here's how it could look like:

extern "usersource.hpp" namespace "whatever::name" {
  real foo(real x, data array[] real z);
  array[] int bar(array[] int y);
}
functions { }
model { }

transpiles to

#include <stan/model/model_header.hpp>
#include <usersource.hpp>

namespace model_namespace {

using whatever::name::foo;
using whatever::name::bar;

class model {
  // ...
}

}

It might be helpful to have a template of what Stan expects the C++ to look like. Stanc3 could have --generate-cpp-header option that outputs the declarations it would generate for the external functions, if they weren't external. So for the above something like

#include <stan/math.hpp>

namespace whatever {
namespace name {

template <typename T0__,
          stan::require_all_t<stan::is_stan_scalar<T0__>>* = nullptr>
stan::promote_args_t<T0__>
foo(const T0__& x, const std::vector<double>& z, std::ostream* pstream__);

std::vector<int> bar(const std::vector<int>& y, std::ostream* pstream__);

}
}

Finally, @andrjohns bring up a good point: importing functions from .hpp file should look similar to importing from .stanfunctions file. In other words, the design for external function support should go hand-in-hand with more structured imports replacing the current "preprocessor macro" #include. However, there's a significant difference between the two: C++ functions must be declared in the importing Stan file (for type checking) but an imported Stan function does not need a re-declaration (because the transpiler parses the source code anyway).

I think a reasonable option is an extra functions block with a filename and a list of imported names.

functions "filename.stanfunctions" {
  foo, bar, baz
}
functions { }
model { }

WardBrian commented 1 year ago

The user may want to import multiple functions from the same file so I'd suggest extern blocks rather than attaching the filename to each function.

This is true, but IMO I prefer to have the file name attached to each one and then just de-duplicated by the compiler. No need for a new block, and it's always clear which file to look into if you want to find a certain function.

extern "foo.hpp" void bar();
extern "baz.hpp" void frob();
extern "foo.hpp" void glop(); // same file as above

Is there a reason to include user code directly in the model namespace instead of letting the user choose their own namespace and then having using ...; declarations in the model namespace? I think it'd be cleaner to always have the #include at toplevel.

The only reason not to do this is I could not think of a non-clunky way to have the user communicate what namespace they used in their code. If we can agree on one then I think we should do the code generation as you suggest. The extra block does make this a bit easier, but I'm still not sold on adding a 8th block to the language for this (relatively obscure) feature

It might be helpful to have a template of what Stan expects the C++ to look like. Stanc3 could have --generate-cpp-header option that outputs the declarations it would generate for the external functions, if they weren't external.

This would be easy to add, but I'd argue it wouldn't really be that helpful. The way we generate user defined functions at the moment is very specific to both the overload logic we employ and because there are no autodiff specific specializations for UDFs. The signatures such a flag would generate would be useful for things like the Eigen templates, but if you're trying to write a function and a specialization for its gradient (which I think is a major use case) the signature needs to look more like those in the math library itself, not what we currently generate.

To both your final point and @andrjohns, I think a more general mechanism than #include is definitely something we want to start designing, but IMO it doesn't need to be considered specifically here. To me it's not a problem if these end up looking different from each other - they are different, really. Something like @nhuurre's suggested syntax for this is interesting, but to me it seems like we'd ultimately need to parse the entire file anyway (same as we do with #includes now) to find those functions, so the only difference is what ends up included in the AST.

stan-dev / stanc3