Closed ax3l closed 4 years ago
At the risk of exposing my large lack of understanding, I just want to share here that according to a certain prominent LLVM developer, trying to get around the One Definition Rule inevitably ends up with getting sucked into a singularity. However, in a REPL, there should be a way to implement redefinition behavior by making everything a compound statement. If I understand correctly, that would transform something like
int a = 0;
void hi() { std::cout << "hi" << std::endl; };
into
{
int a = 0;
{
void hi() { std::cout << "hi" << std::endl; };
// following statements nest here
}
}
Which means that upon subsequent redefinition of hi, a separate new "thread" could be constructed, so that the following:
int a = 0;
void hi() { std::cout << "hi" << std::endl; };
void hi() { std::cout << "hi again" << std::endl; };
would be translated in cling to:
{
int a = 0;
{
void hi() { std::cout << "hi" << std::endl; };
}
{
void hi() { std::cout << "hi again" << std::endl; };
// NOW following statements nest here instead
}
}
Would this be a feasible way of implementing this behavior in cling? It would clearly not be maximally performant with a lot of statements, but at least allowing such behavior optionally, it would allow people to experiment with it.
I'll own up to my terrible ideas. ;]
I don't use Cling, but I'll hypothesize a REPL interface by showing lines after >
as entered into the REPL prompt.
I was focusing on non global variables which may or may not be what you want to focus on, but would suggest:
> int a = 1;
> std::cout << a << "\n";
> int a = 2;
> std::cout << a << "\n";
transforms into:
{
int a = 1;
{
std::cout << a << "\n";
{
int a = 2;
{
std::cout << a << "\n";
}
}
}
}
This allows shadowing naturally.
However, it assumes you are operating at block scope... And that in turn means you probably aren't defining functions. And even if you are, shadowing may not work as well for that as those go into the enclosing namespace, not the block. If you are working at block scope, and you define lambdas instead of actual functions, then this technique works fine.
Let's consider other scopes briefly....
If you are working at global scope or namespace scope, then inline namespaces might be used to allow this kind of chained shadowing:
inline namespace n0001 {
int f() { return 1; }
int test1 = f();
inline namespace n0002 {
int f() { return 2; }
int test2 = f();
inline namespace n0003 {
int f() { return 3; }
int test3 = f();
}
}
}
(Compiler Explorer link: https://godbolt.org/z/VhCnX4)
I can't come up with any useful semantics for allowing shadowing within a class body's scope, nor is there any syntax I can think of to do this.
Also worth noting that you can just use nested anonymous namespaces for the namespace scope:
namespace {
int f() { return 1; }
namespace {
int test1 = f();
namespace {
int f() { return 2; }
namespace {
int test2 = f();
}
}
}
}
But it may make some things harder (and some things easier) due to not having a reliable mangled name.
The key is that by using increasing layers of nesting you can shadow things "safely".
Thank you very much for your ideas. This issue is really about finding suitable workflows for an iterative programming style as people are used to, e.g. in Jupyter. Maybe we can already do it, maybe we need to extend cling, well actually C++ for a REPL, for it.
Anonymous and inline namespace
s are indeed a way that looks promising. What I am afraid of is that they need to be nested, which means if you execute subsequent REPL input block (Jupyter "cells") you have to treat them as a single cell all the time, if I am not missing the point? Or are you suggesting to perform the input wrapping in exactly that way when cling treats input?
This would then be similar to rolling back the state of cling, which one could already to already. But this is then still "linear" programming (evaluating all input instead of the latest added block) contrary to the nice "jumping back and forth" users love with Jupyter cells.
@egpbos if I understand your examples correctly, all declarations are scoped and could not be called (functions) / used (vars) from the underlying "global" scope. But the drafted namespace
s help here, besides being nested :)
I'm suggesting that the linear sequence of things in the notebook is mapped into the nested structure.
You'll always want to evaluate from top to bottom, but you can trigger that as users jump around the notebook and make changes.
The idea is to use a chain of nested scopes to represent the sequence in either the interpreter or the notebook. This essentially is just an implementation strategy for shadowing.
cc @SimeonEhrig any thoughts on this? :)
I definitely know nothing about how cling
works, but if the arguments of the function are always the same, why not just define a function pointer?
For example:
double (*foo)(double);
foo = static_cast<double (*)(double)>([](double d){ return 1.0; });
foo(1.0) // returns 1.0
foo = static_cast<double (*)(double)>([](double d){ return 2.0; });
foo(1.0) // return 2.0
If the function signature is different, then it won't be a problem, right?
For redefining classes, maybe it is a good idea to use macros??
// First definition of A
#define A underlying_A1
class A { ... };
// Refine A
#undef A
#define A underlying_A2
class A { ... };
// For any class, using directive, function that relies on class A,
// turn them into a template.
// A class depend on A
template <class _A = A>
class ... { ... };
Thanks for your thoughts!
Function pointers: yes, a modern C++ way instead of function pointers that can be re-defined at runtime would likely be lambdas and std::function
. I wrote them as alternatives already in the last section of the issue description ;-) they solve part of the problem, if one wants to change signatures as well they need to be combined again with e.g. std::any
to allow changing the type signatures.
#include <any>
#include <functional>
#include <iostream>
using namespace std;
int main()
{
any f = function<bool (double)>([]( double x ) { return x < 5.0; });
cout << any_cast<function<bool (double)> >(f)( 3.0 ) << " ";
f = function<int (int)>([]( int x ) { return x * 2.0; });
cout << any_cast<function<int (int)> >(f)( 42 );
return 0;
}
// This C++17 example allows to store a changing function signature in the same variable name:
// bool function and int function
// "1 84" (True, 84)
// One would write in a Jupyter/Cling cell the assignment to that std::any variable,
// which can be called multiple times. I wonder if we can force inlining with cling here.
Macros & Classes: I don't fully understand how the shown macros might help here, but deriving as you suggest CRTP-style as a wrapper and constantly increasing the template argument, maybe in combination with the __COUNTER__
-macro might be cool!
Too bad we don't have inline, anonymous class definitions for template arguments.
@ax3l I don't think std::function
is needed. A global function won't have any reference to local variables except global one, so lambda
is always convertible to a raw function pointer, there will be no need for std::function
because global function won't have any data. Using it will only be proven to be an unnecessary burden.
However, I do think a special callable class with a variadic function call overload that wraps all the overloaded version of the function should be written to enable overload.
A possible implementation will be using a hash table that uses std::type_info::hash_code
(or probably std::type_info::name
) of all arguments to produce a hash and stores the function pointers.
Whenever a function call is made, a search is done to find whether that overload is available or not. If it i available, then it is called; otherwise, an exception is thrown.
This is the best way I can think of since I don't know anything about the compiler. Maybe somebody who understands llvm
well can produce a far more easier to implement and more efficient solution than mine.
@ax3l About overloading the class, maybe it can be done by defining a macro that will be substituted with the real class name, and when it is overloaded, #undef
and #define
it. This is probably not a good solution.
But regardless of how it is implemented, all classes
/functions
/using
that uses that class will have to be templatized like this template </* previous template argument if there is any */, class _A = A>
, and all subsequent use of A
will be substituted by _A
.
Sorry guys, just forget about the post before the previous post I posted. That method I mentioned can't deduce the return type, so it is probably better to introduce compiler instrinct in cling
that allows users to delete a defined symbol and redefine it.
Awesome idea, @chandlerc - we never came up with nested unnamed namespaces!
Our problem is people doing
struct Mine {int i = 12;};
Mine m;
file->Write("m", &m);
Here, Mine
needs to be ::Mine
, else we cannot read this back in a meaningful way. That's an example of the mangling issues Chandler mentioned; there are others, for instance overloading:
namespace {
int f(int);
namespace {
const char* f(const char*);
namespace {
int i = f(42); // ERROR: doesn't see f(int).
}
}
}
Currently, our only known and theoretically possible solution is to rip declarations out of the AST and CodeGen, and to make the corresponding symbols inaccessible to RuntimeDyld's symbol lookup. That's super awkward because it's nothing a compiler would ever want to do. It works okay for some, and does not work for other declarations (here with ROOT ):
[cling]$ int i = 42;
[cling]$ .undo 1
[cling]$ int i = 43;
[cling]$ i
(int) 43
For now this is what cell-replay / -unloading should be based on.
My main concern is that this approach is an uphill battle with any future clang changes, needs a compiler expert for maintenance and is thus a sustainability issue. But barring any better approach we will have to invest dev time to fix the remaining issues, or at least the major ones.
I agree with @Axel-Naumann . At the moment, I see a verification problem at the C++ statements, which should to be used to allow redefinition. I'm not sure, if anybody can verify, that the semantic of the source code isn't changed beside the deliberately redefinition.
I think manipulating the AST is most safety way, to keep the semantic. In this case, I also agree with Axel, it is a really basic feature, which needs a lot of modification of cling, clang and llvm code and for this, we need a compiler expert.
Still not sure why you can't map this to nested namespaces to model the incremental additions....
You can even inject using namespace <nested name>
into the global namespace after each nested namespace is created for that bit of cling, so that referencing ::foo
would still work, but understand that it would reference the first foo
name defined. You would say foo
to get the latest, and ::foo
to get the first.
You could even put the current iteration into the cling prompt, and tie that to the nested namespace allowing people to reference specific foo
s defined earlier through a namespace.
Consider something like:
[cling:001]$ int i = 42;
[cling:002]$ int i = 43;
[cling:003]$ i
(int) 43
[cling:004]$ ::step001::i
(int) 42
This could be implemented by synthesizing the following C++:
namespace __impl_step001 {
int i = 42;
}
namespace step001 = __impl_step001;
namespace __impl_step001 {
namespace __impl_step002 {
int i = 43;
}
}
namespace step002 = __impl_step001::__impl_step002;
Thank you, @chandlerc and @SimeonEhrig !
When writing out a value we ask "what's your type", and the answer would be __impl_step001::__impl_step002::Mine
. That's certainly not what users would expect; reading this back e.g. in a compiled program would fail, because we will fail to identify __impl_step001::__impl_step002::Mine
with ::Mine
. We can just work around that in the I/O layer - solved, conceptually.
Regarding the overload issue:
namespace __impl_step001 {
int f(int); //<- user input
}
namespace __impl_step002 {
using namespace __impl_step001;
const char* f(const char*); //<- user input
}
namespace __impl_step003 {
using namespace __impl_step002;
int i = f(42); //<- user input
}
would do.
Now we're left with other mangling-style issues, e.g. explicit specializations:
...
namespace __impl_step003 {
using namespace __impl_step002;
template <> struct X<Spec> {}; // error: spec in different namespace!
}
For this one we could try to detect specializations, through pre-parsing. And then what? :-)
In general the fundamental difference between prompt-parsed AST nodes and code injected through files is worrisome, to me. We certainly cannot move file-based code into __impl_step
namespaces; this would break all shared library use cases, where the header declares symbols that are implemented elsewhere. It might just be an interface question.
Given that this is such a pivotal point for the future / maintainability of cling I will discuss this with a couple of experienced interpreter people (@vgvassilev and @pcanal for instance) and come back with a potentially more cohesive reply.
Thanks everyone for sharing your ideas!
Just to add to the example above, could we be auto-using
all previous declarations? Or on new declarations, check if one in ::
(or the special namespaces one builds up) exists with the same name already?
namespace {
int f(int) { return 1; } // <- user input
namespace {
// let cling pull in all known declarations automatically:
using ::f; // <- automatic "reflection" of :: or previous namespace?
// or just pull it in here, if an `f` in `::` exists already:
const char* f(const char*); // <- user input
// update: overwrites don't work when using is always done...
// int f(int) { return 2; } // <- user input
namespace {
int i = f(42); // sees f(int) now as well // <- user input
}
}
}
Update: hm, seems that's basically what Chandler said ;-)
Just to add to the example above, could we auto-
using
all previously declarations?
No, you shouldn't do that IMO.
The user needs to decide whether they want to define a new f
, or add an overload. I would suggest explicitly adding a using declaration as a means to add an overload (instead of hiding and introducing a new named entity).
namespace { int f(int); namespace { // let cling pull in all known defines automatically: using ::f; const char* f(const char*); namespace { int i = f(42); // ERROR: doesn't see f(int). } } }
I just have a question with this solution.
Consider the following case:
inline int F()
{/* ... */}
// F2 depends on F
// and it is beneficial to inline the call
void F2()
{
auto ret = F();
// ...
}
If the cling
JIT
compiles this code and inline int F()
, how to reJIT
compile it?
And also consider class
dependency.
class A {};
class B: A {};
template <class A_template_arg = A>
class C: A_template_arg {};
// Redefinition of A
class A {/**/};
C
can work but can B
?
And how to handle if the user creates an object to them?
A a;
int (*fp)() = &F;
If a
remains to be the same, then the one-definition
rule is violated, ex, these old values have different type_info
but same return value of name()
; When overloading, should them share the same overload?
If not, how to change them?
@NobodyXu If something uses a previous definition then subsequent re-definitions will not change that. Very similar to
int f(double);
int val = f(42);
int f(int);
val
will be set by the first f
.
I don't find that surprising behavior.
@chandlerc I don't think our users will appreciate the need to explicitly pull in previous declarations to create overloads. But as I said - we'll discuss, I'll come back with something more substantial than my immediate reactions here.
Thank you, @chandlerc and @SimeonEhrig !
When writing out a value we ask "what's your type", and the answer would be
__impl_step001::__impl_step002::Mine
. That's certainly not what users would expect; reading this back e.g. in a compiled program would fail, because we will fail to identify__impl_step001::__impl_step002::Mine
with::Mine
. We can just work around that in the I/O layer - solved, conceptually.
The way I would phrase this is slightly different.
When you get asked "what's your type", I think the question should be "what is the shortest name I could use to refer to that type".
Because you conceptually never close one of these nested namespaces, these prefixes aren't necessary.
I would suggest using just the base name unless the name has been shadowed (which you can query out of the AST). When this fails, I would suggest using the namespace alias directly to the step: ::step002::Mine
which seems like a reasonable disambiguation.
Put differently, we can think of this as a nested-namespace rewrite step to pick a minimal way to unambiguously reference the type from the current nest level, with added use of the namespace aliases introduced in the process.
Regarding the overload issue:
namespace __impl_step001 { int f(int); //<- user input } namespace __impl_step002 { using namespace __impl_step001; const char* f(const char*); //<- user input } namespace __impl_step003 { using namespace __impl_step002; int i = f(42); //<- user input }
would do.
Now we're left with other mangling-style issues, e.g. explicit specializations:
... namespace __impl_step003 { using namespace __impl_step002; template <> struct X<Spec> {}; // error: spec in different namespace! }
For this one we could try to detect specializations, through pre-parsing. And then what? :-)
Through out this, you seem to be using using namespace
directives rather than actually nesting... I really am specifically suggesting nesting. I'm not at all sure that the directive approach works.
Anyways, for explicit specializations, I don't think there is anything you can do w/o changing the language rules. I'm also not sure this is a bad thing, as it seems like a real edge case.
What I would suggest is that you make the fact that these aren't the same namespace a bit more explicit in the model of the interpreter and let the user override it to handle cases where you have to use a particular namespace.
As an example, you could have special repl syntax that escapes the nesting sequence and instead allows the user to add some code to an explicitly specified namespace, potentially a prior one, thereby allowing them to explicitly specialize a thing.
I also think it might make sense to make it explicit in the REPL when you "step" to a fresh scope, allowing folks to easily write more than one component in a single scope within the REPL. You can reflect this cleanly in the prompt so users are aware of their context and have the control needed to complex things while the simple/easy things remain simple and easy.
The other nice thing is that it lets you explain and document all of this in terms of the language without having to invent and document special rules.
In general the fundamental difference between prompt-parsed AST nodes and code injected through files is worrisome, to me. We certainly cannot move file-based code into
__impl_step
namespaces; this would break all shared library use cases, where the header declares symbols that are implemented elsewhere. It might just be an interface question.Given that this is such a pivotal point for the future / maintainability of cling I will discuss this with a couple of experienced interpreter people (@vgvassilev and @pcanal for instance) and come back with a potentially more cohesive reply.
Thanks everyone for sharing your ideas!
Oh, one thing that the explicit model of the scope-transition in the repl would help with is overload sets. It becomes easy to build those up w/o tons of using declarations, you just use "shift enter" or whatever the interactive facility is to put multiple things into a single step's scope before returning to the fresh-scope-each-time step mode, letting you accumulate an overload set, etc.
Seems like this has now been successfully developed, really awesome work by @jalopezg-uc3m et al.! https://dl.acm.org/doi/abs/10.1145/3377555.3377901 So probably this issue can be closed?
Absolutely, added in https://github.com/root-project/root/pull/4214 and it's amazing! Thank you all for your input and @jalopezg-uc3m in particular for the implementation! Congrats on the paper!
As mentioned at the ROOT User's Workshop 2018, many people using Cling or Cling in Jupyter for interactive workflows struggle with the following problem. When interactively defining and improving functions, classes and variables, the state of the interpreter is in the way of fast iteration of prototypes.
Let's take the following example:
If a user now wants to change
foo()
, they will just add it again, e.g. asYou can also imagine this in a Notebook: a users just scrolls up, changes and re-evaluates the cell.
Of course that is not working right now, and throws something like:
Possible Approach: Active Shadowing
Implementation?
I know that this is asking for a dynamic type system, but only for explicitly selected functions/classes/vars, in Cling C++. I have absolutely no idea, how one could efficiently implement such re-definitions but think it would be an important workflow for the code in cling's
main()
wrapper. I am aware that problems, like indirect calls frombar
that is not explicitly re-defined whenfoo()
is shadowed, can be very challenging, e.g. iffoo()
was inlined intobar
.If such an implementation is only possible under side-conditions, e.g. that
main()
-defined andallow_shadowing
attributed functions can not be inlined, that is probably fine as well. As a user, I don't expect the most efficient code from my control logic that I am interactively typing (between notebook cells), but from the code that is used from external imports.Alternatives
For functions represented as lambdas and variables one could probably establish work-around workflows by pushing them into a C++17
std::any
variables. For re-defining classes I have no good idea at the moment besides resetting the state of the interpreter / restarting the jupyter kernel.