root-project / cling

The cling C++ interpreter
Other
3.53k stars 269 forks source link

Explicit Shadowing of Functions/Classes/Vars #259

Closed ax3l closed 4 years ago

ax3l commented 6 years ago

As mentioned at the ROOT User's Workshop 2018, many people using Cling or Cling in Jupyter for interactive workflows struggle with the following problem. When interactively defining and improving functions, classes and variables, the state of the interpreter is in the way of fast iteration of prototypes.

Let's take the following example:

double foo(double a)
{
    return a * a;
}

double x = foo(7.);

If a user now wants to change foo(), they will just add it again, e.g. as

double foo(double a)
{
    return a * a * a;
}

You can also imagine this in a Notebook: a users just scrolls up, changes and re-evaluates the cell.

Of course that is not working right now, and throws something like:

input_line_12:1:8: error: redefinition of 'foo'
double foo(double a)
       ^
input_line_7:1:8: note: previous definition is here
double foo(double a)

Possible Approach: Active Shadowing


#pragma cling allow_shadowing
double foo(double a);

double foo(double a)
{
    return a * a;
}
// now a sqrt

double foo(double a)
{
    return a * a * a;
}
// now a third potency

int foo(int a)
{
    return a * a * a;
}
// now a third potency with changed signature

Implementation?

I know that this is asking for a dynamic type system, but only for explicitly selected functions/classes/vars, in Cling C++. I have absolutely no idea, how one could efficiently implement such re-definitions but think it would be an important workflow for the code in cling's main() wrapper. I am aware that problems, like indirect calls from bar that is not explicitly re-defined when foo() is shadowed, can be very challenging, e.g. if foo() was inlined into bar.

If such an implementation is only possible under side-conditions, e.g. that main()-defined and allow_shadowing attributed functions can not be inlined, that is probably fine as well. As a user, I don't expect the most efficient code from my control logic that I am interactively typing (between notebook cells), but from the code that is used from external imports.

Alternatives

For functions represented as lambdas and variables one could probably establish work-around workflows by pushing them into a C++17 std::any variables. For re-defining classes I have no good idea at the moment besides resetting the state of the interpreter / restarting the jupyter kernel.

egpbos commented 6 years ago

At the risk of exposing my large lack of understanding, I just want to share here that according to a certain prominent LLVM developer, trying to get around the One Definition Rule inevitably ends up with getting sucked into a singularity. However, in a REPL, there should be a way to implement redefinition behavior by making everything a compound statement. If I understand correctly, that would transform something like

int a = 0;
void hi() { std::cout << "hi" << std::endl; };

into

{
  int a = 0;
  {
    void hi() { std::cout << "hi" << std::endl; };
    // following statements nest here
  }
}

Which means that upon subsequent redefinition of hi, a separate new "thread" could be constructed, so that the following:

int a = 0;
void hi() { std::cout << "hi" << std::endl; };
void hi() { std::cout << "hi again" << std::endl; };

would be translated in cling to:

{
  int a = 0;
  {
    void hi() { std::cout << "hi" << std::endl; };
  }

  {
    void hi() { std::cout << "hi again" << std::endl; };
    // NOW following statements nest here instead
  }
}

Would this be a feasible way of implementing this behavior in cling? It would clearly not be maximally performant with a lot of statements, but at least allowing such behavior optionally, it would allow people to experiment with it.

chandlerc commented 6 years ago

I'll own up to my terrible ideas. ;]

I don't use Cling, but I'll hypothesize a REPL interface by showing lines after > as entered into the REPL prompt.

I was focusing on non global variables which may or may not be what you want to focus on, but would suggest:

> int a = 1;
> std::cout << a << "\n";
> int a = 2;
> std::cout << a << "\n";

transforms into:

{
  int a = 1;
  {
    std::cout << a << "\n";
    {
      int a = 2;
      {
        std::cout << a << "\n";
      }
    }
  }
}

This allows shadowing naturally.

However, it assumes you are operating at block scope... And that in turn means you probably aren't defining functions. And even if you are, shadowing may not work as well for that as those go into the enclosing namespace, not the block. If you are working at block scope, and you define lambdas instead of actual functions, then this technique works fine.

Let's consider other scopes briefly....

If you are working at global scope or namespace scope, then inline namespaces might be used to allow this kind of chained shadowing:

inline namespace n0001 {
    int f() { return 1; }
    int test1 = f();
    inline namespace n0002 {
        int f() { return 2; }
        int test2 = f();
        inline namespace n0003 {
            int f() { return 3; }
            int test3 = f();
        }
    }
}

(Compiler Explorer link: https://godbolt.org/z/VhCnX4)

I can't come up with any useful semantics for allowing shadowing within a class body's scope, nor is there any syntax I can think of to do this.

chandlerc commented 6 years ago

Also worth noting that you can just use nested anonymous namespaces for the namespace scope:

namespace {
  int f() { return 1; }
  namespace {
    int test1 = f();
    namespace {
      int f() { return 2; }
      namespace {
        int test2 = f();
      }
    }
  }
}

But it may make some things harder (and some things easier) due to not having a reliable mangled name.

The key is that by using increasing layers of nesting you can shadow things "safely".

ax3l commented 6 years ago

Thank you very much for your ideas. This issue is really about finding suitable workflows for an iterative programming style as people are used to, e.g. in Jupyter. Maybe we can already do it, maybe we need to extend cling, well actually C++ for a REPL, for it.

Anonymous and inline namespaces are indeed a way that looks promising. What I am afraid of is that they need to be nested, which means if you execute subsequent REPL input block (Jupyter "cells") you have to treat them as a single cell all the time, if I am not missing the point? Or are you suggesting to perform the input wrapping in exactly that way when cling treats input?

This would then be similar to rolling back the state of cling, which one could already to already. But this is then still "linear" programming (evaluating all input instead of the latest added block) contrary to the nice "jumping back and forth" users love with Jupyter cells.

ax3l commented 6 years ago

@egpbos if I understand your examples correctly, all declarations are scoped and could not be called (functions) / used (vars) from the underlying "global" scope. But the drafted namespaces help here, besides being nested :)

chandlerc commented 6 years ago

I'm suggesting that the linear sequence of things in the notebook is mapped into the nested structure.

You'll always want to evaluate from top to bottom, but you can trigger that as users jump around the notebook and make changes.

The idea is to use a chain of nested scopes to represent the sequence in either the interpreter or the notebook. This essentially is just an implementation strategy for shadowing.

ax3l commented 6 years ago

cc @SimeonEhrig any thoughts on this? :)

NobodyXu commented 6 years ago

I definitely know nothing about how cling works, but if the arguments of the function are always the same, why not just define a function pointer?

For example:

double (*foo)(double);

foo = static_cast<double (*)(double)>([](double d){ return 1.0; });
foo(1.0) // returns 1.0

foo = static_cast<double (*)(double)>([](double d){ return 2.0; });
foo(1.0) // return 2.0

If the function signature is different, then it won't be a problem, right?

For redefining classes, maybe it is a good idea to use macros??

// First definition of A
#define A underlying_A1
class A { ... };
// Refine A
#undef A
#define A underlying_A2
class A { ... };

// For any class, using directive, function that relies on class A,
// turn them into a template.
// A class depend on A
template <class _A = A>
class ... { ... };
ax3l commented 6 years ago

Thanks for your thoughts!

Function pointers: yes, a modern C++ way instead of function pointers that can be re-defined at runtime would likely be lambdas and std::function. I wrote them as alternatives already in the last section of the issue description ;-) they solve part of the problem, if one wants to change signatures as well they need to be combined again with e.g. std::any to allow changing the type signatures.

#include <any>
#include <functional>
#include <iostream>

using namespace std;

int main()
{
    any f = function<bool (double)>([]( double x ) { return x < 5.0; });
    cout << any_cast<function<bool (double)> >(f)( 3.0 ) << " ";

    f = function<int (int)>([]( int x ) { return x * 2.0; });
    cout << any_cast<function<int (int)> >(f)( 42 );
    return 0;
}

// This C++17 example allows to store a changing function signature in the same variable name:
// bool function and int function
// "1 84" (True, 84)
// One would write in a Jupyter/Cling cell the assignment to that std::any variable,
// which can be called multiple times. I wonder if we can force inlining with cling here.

Macros & Classes: I don't fully understand how the shown macros might help here, but deriving as you suggest CRTP-style as a wrapper and constantly increasing the template argument, maybe in combination with the __COUNTER__-macro might be cool!

Too bad we don't have inline, anonymous class definitions for template arguments.

NobodyXu commented 6 years ago

@ax3l I don't think std::function is needed. A global function won't have any reference to local variables except global one, so lambda is always convertible to a raw function pointer, there will be no need for std::function because global function won't have any data. Using it will only be proven to be an unnecessary burden.

However, I do think a special callable class with a variadic function call overload that wraps all the overloaded version of the function should be written to enable overload.

A possible implementation will be using a hash table that uses std::type_info::hash_code(or probably std::type_info::name) of all arguments to produce a hash and stores the function pointers.

Whenever a function call is made, a search is done to find whether that overload is available or not. If it i available, then it is called; otherwise, an exception is thrown.

This is the best way I can think of since I don't know anything about the compiler. Maybe somebody who understands llvm well can produce a far more easier to implement and more efficient solution than mine.

NobodyXu commented 6 years ago

@ax3l About overloading the class, maybe it can be done by defining a macro that will be substituted with the real class name, and when it is overloaded, #undef and #define it. This is probably not a good solution.

But regardless of how it is implemented, all classes/functions/using that uses that class will have to be templatized like this template </* previous template argument if there is any */, class _A = A>, and all subsequent use of A will be substituted by _A.

NobodyXu commented 6 years ago

Sorry guys, just forget about the post before the previous post I posted. That method I mentioned can't deduce the return type, so it is probably better to introduce compiler instrinct in cling that allows users to delete a defined symbol and redefine it.

Axel-Naumann commented 6 years ago

Awesome idea, @chandlerc - we never came up with nested unnamed namespaces!

Our problem is people doing

struct Mine {int i = 12;};
Mine m;
file->Write("m", &m);

Here, Mine needs to be ::Mine, else we cannot read this back in a meaningful way. That's an example of the mangling issues Chandler mentioned; there are others, for instance overloading:

namespace {
    int f(int);
    namespace {
        const char* f(const char*);
        namespace {
            int i = f(42); // ERROR: doesn't see f(int).
        }
    }
}

Currently, our only known and theoretically possible solution is to rip declarations out of the AST and CodeGen, and to make the corresponding symbols inaccessible to RuntimeDyld's symbol lookup. That's super awkward because it's nothing a compiler would ever want to do. It works okay for some, and does not work for other declarations (here with ROOT ):

[cling]$ int i = 42;
[cling]$ .undo 1
[cling]$ int i = 43;
[cling]$ i
(int) 43

For now this is what cell-replay / -unloading should be based on.

My main concern is that this approach is an uphill battle with any future clang changes, needs a compiler expert for maintenance and is thus a sustainability issue. But barring any better approach we will have to invest dev time to fix the remaining issues, or at least the major ones.

SimeonEhrig commented 6 years ago

I agree with @Axel-Naumann . At the moment, I see a verification problem at the C++ statements, which should to be used to allow redefinition. I'm not sure, if anybody can verify, that the semantic of the source code isn't changed beside the deliberately redefinition.

I think manipulating the AST is most safety way, to keep the semantic. In this case, I also agree with Axel, it is a really basic feature, which needs a lot of modification of cling, clang and llvm code and for this, we need a compiler expert.

chandlerc commented 6 years ago

Still not sure why you can't map this to nested namespaces to model the incremental additions....

You can even inject using namespace <nested name> into the global namespace after each nested namespace is created for that bit of cling, so that referencing ::foo would still work, but understand that it would reference the first foo name defined. You would say foo to get the latest, and ::foo to get the first.

You could even put the current iteration into the cling prompt, and tie that to the nested namespace allowing people to reference specific foos defined earlier through a namespace.

Consider something like:

[cling:001]$ int i = 42;
[cling:002]$ int i = 43;
[cling:003]$ i
(int) 43
[cling:004]$ ::step001::i
(int) 42

This could be implemented by synthesizing the following C++:

namespace __impl_step001 {
  int i = 42;
}
namespace step001 = __impl_step001;
namespace __impl_step001 {
  namespace __impl_step002 {
    int i = 43;
  }
}
namespace step002 = __impl_step001::__impl_step002;
Axel-Naumann commented 6 years ago

Thank you, @chandlerc and @SimeonEhrig !

When writing out a value we ask "what's your type", and the answer would be __impl_step001::__impl_step002::Mine. That's certainly not what users would expect; reading this back e.g. in a compiled program would fail, because we will fail to identify __impl_step001::__impl_step002::Mine with ::Mine. We can just work around that in the I/O layer - solved, conceptually.

Regarding the overload issue:

namespace __impl_step001 {
    int f(int); //<- user input
}
namespace __impl_step002 {
    using namespace __impl_step001;
    const char* f(const char*); //<- user input
}
namespace __impl_step003 {
    using namespace __impl_step002;
    int i = f(42); //<- user input
}

would do.

Now we're left with other mangling-style issues, e.g. explicit specializations:

...
namespace __impl_step003 {
    using namespace __impl_step002;
    template <> struct X<Spec> {}; // error: spec in different namespace!
}

For this one we could try to detect specializations, through pre-parsing. And then what? :-)

In general the fundamental difference between prompt-parsed AST nodes and code injected through files is worrisome, to me. We certainly cannot move file-based code into __impl_step namespaces; this would break all shared library use cases, where the header declares symbols that are implemented elsewhere. It might just be an interface question.

Given that this is such a pivotal point for the future / maintainability of cling I will discuss this with a couple of experienced interpreter people (@vgvassilev and @pcanal for instance) and come back with a potentially more cohesive reply.

Thanks everyone for sharing your ideas!

ax3l commented 6 years ago

Just to add to the example above, could we be auto-using all previous declarations? Or on new declarations, check if one in :: (or the special namespaces one builds up) exists with the same name already?

namespace {
    int f(int) { return 1; }  // <- user input
    namespace {
        // let cling pull in all known declarations automatically:
        using ::f; // <- automatic "reflection" of :: or previous namespace?

        // or just pull it in here, if an `f` in `::` exists already:
        const char* f(const char*); // <- user input

        // update: overwrites don't work when using is always done...
        // int f(int) { return 2; }  // <- user input
        namespace {
            int i = f(42); // sees f(int) now as well // <- user input
        }
    }
}

Update: hm, seems that's basically what Chandler said ;-)

chandlerc commented 6 years ago

Just to add to the example above, could we auto-using all previously declarations?

No, you shouldn't do that IMO.

The user needs to decide whether they want to define a new f, or add an overload. I would suggest explicitly adding a using declaration as a means to add an overload (instead of hiding and introducing a new named entity).

namespace {
    int f(int);
    namespace {
        // let cling pull in all known defines automatically:
        using ::f;

        const char* f(const char*);
        namespace {
            int i = f(42); // ERROR: doesn't see f(int).
        }
    }
}
NobodyXu commented 6 years ago

I just have a question with this solution.

Consider the following case:

inline int F()
{/* ... */}

// F2 depends on F
// and it is beneficial to inline the call
void F2()
{
    auto ret = F();
    // ...
}

If the cling JIT compiles this code and inline int F(), how to reJIT compile it?

NobodyXu commented 6 years ago

And also consider class dependency.

class A {};

class B: A {};

template <class A_template_arg = A>
class C: A_template_arg {};

// Redefinition of A
class A {/**/};

C can work but can B?

NobodyXu commented 6 years ago

And how to handle if the user creates an object to them?

A a;

int (*fp)() = &F;

If a remains to be the same, then the one-definition rule is violated, ex, these old values have different type_info but same return value of name(); When overloading, should them share the same overload?

If not, how to change them?

Axel-Naumann commented 6 years ago

@NobodyXu If something uses a previous definition then subsequent re-definitions will not change that. Very similar to

int f(double);
int val = f(42);
int f(int);

val will be set by the first f.

I don't find that surprising behavior.

@chandlerc I don't think our users will appreciate the need to explicitly pull in previous declarations to create overloads. But as I said - we'll discuss, I'll come back with something more substantial than my immediate reactions here.

chandlerc commented 6 years ago

Thank you, @chandlerc and @SimeonEhrig !

When writing out a value we ask "what's your type", and the answer would be __impl_step001::__impl_step002::Mine. That's certainly not what users would expect; reading this back e.g. in a compiled program would fail, because we will fail to identify __impl_step001::__impl_step002::Mine with ::Mine. We can just work around that in the I/O layer - solved, conceptually.

The way I would phrase this is slightly different.

When you get asked "what's your type", I think the question should be "what is the shortest name I could use to refer to that type".

Because you conceptually never close one of these nested namespaces, these prefixes aren't necessary.

I would suggest using just the base name unless the name has been shadowed (which you can query out of the AST). When this fails, I would suggest using the namespace alias directly to the step: ::step002::Mine which seems like a reasonable disambiguation.

Put differently, we can think of this as a nested-namespace rewrite step to pick a minimal way to unambiguously reference the type from the current nest level, with added use of the namespace aliases introduced in the process.

Regarding the overload issue:

namespace __impl_step001 {
    int f(int); //<- user input
}
namespace __impl_step002 {
    using namespace __impl_step001;
    const char* f(const char*); //<- user input
}
namespace __impl_step003 {
    using namespace __impl_step002;
    int i = f(42); //<- user input
}

would do.

Now we're left with other mangling-style issues, e.g. explicit specializations:

...
namespace __impl_step003 {
    using namespace __impl_step002;
    template <> struct X<Spec> {}; // error: spec in different namespace!
}

For this one we could try to detect specializations, through pre-parsing. And then what? :-)

Through out this, you seem to be using using namespace directives rather than actually nesting... I really am specifically suggesting nesting. I'm not at all sure that the directive approach works.

Anyways, for explicit specializations, I don't think there is anything you can do w/o changing the language rules. I'm also not sure this is a bad thing, as it seems like a real edge case.

What I would suggest is that you make the fact that these aren't the same namespace a bit more explicit in the model of the interpreter and let the user override it to handle cases where you have to use a particular namespace.

As an example, you could have special repl syntax that escapes the nesting sequence and instead allows the user to add some code to an explicitly specified namespace, potentially a prior one, thereby allowing them to explicitly specialize a thing.

I also think it might make sense to make it explicit in the REPL when you "step" to a fresh scope, allowing folks to easily write more than one component in a single scope within the REPL. You can reflect this cleanly in the prompt so users are aware of their context and have the control needed to complex things while the simple/easy things remain simple and easy.

The other nice thing is that it lets you explain and document all of this in terms of the language without having to invent and document special rules.

In general the fundamental difference between prompt-parsed AST nodes and code injected through files is worrisome, to me. We certainly cannot move file-based code into __impl_step namespaces; this would break all shared library use cases, where the header declares symbols that are implemented elsewhere. It might just be an interface question.

Given that this is such a pivotal point for the future / maintainability of cling I will discuss this with a couple of experienced interpreter people (@vgvassilev and @pcanal for instance) and come back with a potentially more cohesive reply.

Thanks everyone for sharing your ideas!

chandlerc commented 6 years ago

Oh, one thing that the explicit model of the scope-transition in the repl would help with is overload sets. It becomes easy to build those up w/o tons of using declarations, you just use "shift enter" or whatever the interactive facility is to put multiple things into a single step's scope before returning to the fresh-scope-each-time step mode, letting you accumulate an overload set, etc.

egpbos commented 4 years ago

Seems like this has now been successfully developed, really awesome work by @jalopezg-uc3m et al.! https://dl.acm.org/doi/abs/10.1145/3377555.3377901 So probably this issue can be closed?

ax3l commented 4 years ago

Absolutely, added in https://github.com/root-project/root/pull/4214 and it's amazing! Thank you all for your input and @jalopezg-uc3m in particular for the implementation! Congrats on the paper!