jafingerhut / p4-namespaces

A public repo for discussion of adding namespaces to the P4 programming language
Apache License 2.0
5 stars 0 forks source link

A simple design #21

Open mihaibudiu opened 2 years ago

mihaibudiu commented 2 years ago

Here is a simple possible design based on Dan Talayco's description, where we synthesize the implementation from some primitive building blocks:

With these constructs the way import works is the following: import file as x is equivalent to namespace x { import file }. When importing a module for the first time each imported declaration is given a new internal fresh name, that the user cannot type. Let's assume that the names look like $1, $2, and they are global, not namespaced. Then importing a file the first time creates these declarations. Importing the file (first or second time) creates an additional alias $1 name; statement which gives a new name to the internal object.

So importing a module containing const bit<32> z = 0 with import mod as X will create the following program:

const bit<32> $0 = 0;  // declarations in mod are always top-level
namespace X {
   alias $0 z;
}

This is pretty much the whole definition - which reduces the import to these two additional statements namespace and alias. So if we define the semantics of namespace and alias we have a complete specification of import. The interactions with the preprocessor are independent on this behavior.

If you import a module twice you can get the same objects with fresh names:

import mod as X;
import mod as Y;

gives:

const bit<32> $0 = 0;
namespace X {
   alias $0 z;
}
// $0 is no longer created, it was already.
namespace Y {
   alias $0 z;
}

which is equivalent to:

const bit<32> $0 = 0;
alias $0 X.z;
alias $0 Y.z;
mihaibudiu commented 2 years ago

Notice that the semantics of namespace and alias has nothing to do with files or modules; we can just add these to P4 independently.

jfingerh commented 2 years ago

Lots of discussion on the meaning of this proposal and its consequences during a 2022-Feb-24 meeting, not trying to capture it all here.

Do we think "." is a good separator for namespace.name, or is "::" better somehow? Good to make a list of pros/cons here.

If we used this approach, would we expose "namespace" and "alias" to P4 developers, or reserve it as an implementation detail of "import"?

Mihai: To prototype this would require re-introducing hierarchical identifiers, which was once in P4 but since removed (years ago).

AI Andy: No promises, but will try to find time to flesh this out from a P4 developer's perspective, and see whether it makes sense to make core.p4 and/or architecture definition files like v1model.p4, psa.p4, etc. into modules.

jfingerh commented 2 years ago

AI Andy: Schedule same time slot every 2 weeks from now for, say, 4 more occurrences, in hopes that might be enough to reach a satisfactory conclusion.

det-intel commented 2 years ago

I like this a lot. I have one minor concern.

During the meeting, I believe Mihai said that "import mod" (rather than "import mod as X") has the effect of bringing all the names of objects in mod into the global scope. Now, the actual names are not there as they are replaced by the compiler-generated unique identifiers. However, they are aliased to the names as given in the module. I think this still introduces the possibility that someone can change the code in mod and introduce name conflicts with code that imports mod.

The original proposal was that "import mod" is equivalent to "import mod as mod" (or some well defined mapping of the filename). The intent is to disallow the use of import to pull names into the importing file's namespace. It necessarily introduces a sub-namespace. I still prefer this alternative.

You can certainly argue that if mod changes then any code importing it should be reviewed. However, these conflicts could occur because of internal implementation changes to mod, even if interfaces to mod don't change. One might try to address that by introducing "public" and "private" semantics for import, but it doesn't solve the problem completely and it would probably cause more complexity than benefit.

mihaibudiu commented 2 years ago

In general importing without providing a name will be strongly discouraged, and perhaps only supported for the legacy modules core.p4 and v1model.p4. If we want to be able to import (and not include) core.p4 we need something equivalent to "import core as " so that packet_in does not need to be prefixed with core::packet_in.

det-intel commented 2 years ago

Fair point about "import core". std:: is always annoying in languages that require that.

Is the implication that "import core.p4" will replace "#include core.p4"?

det-intel commented 2 years ago

It was mentioned that multiple namespace declarations with the same name-id contribute to the same logical namespace. Three problems with this: It can make it hard to find definitions; it opens the question again of the order of definitions and references; but more importantly, it makes it easier to introduce name clashes.

Related, can I do this:

import mod1 as X import mod2 as X

Now we're back in the situation where you can introduce name clashes between mod1 and mod2 without knowing it.

If we did go this direction, "into" might be more descriptive than "as".

mihaibudiu commented 2 years ago

I think the desire is to make #include unnecessary. But it will continue to be supported for backwards compatibility. Note that this proposal does not describe how #include works.

mihaibudiu commented 2 years ago

Yes, users can always introduce name clashes. But the important thing is that they can always avoid name clashes by importing into namespaces of their choice.

jfingerh commented 2 years ago

As I have said before in this group, it might be the goal of some people to make #include unnecessary. That is not my personal goal. I would strongly prefer that we have a module/namespace proposal where the C preprocessor is explicitly run on each module/namespace independently of the CPP runs on all other modules/namespaces, including independently of the top level program namespace/module. Hence this issue: https://github.com/jafingerhut/p4-namespaces/issues/1

Of course, my personal goals should not determine the outcome here -- I can easily accept that my desires might have properties that conflict with the desires of others, and other ideas could win out here. But I content that the C preprocessor has useful features like #ifdef, and if we are not getting rid of those (please let us not), then I don't see any harm in allowing #include in the future after namespaces/modules are added, too. Yes, we might want to write some guidelines for people on when one is preferable to the other, but I think that advice might simply amount to "use #include when you want textual inclusion, and explicit sharing of #define symbols between different source files. Use namespaces/modules when you want to keep those things separate, and be able to allow the possibility of independent authors creating top level names that might otherwise conflict with each other"

det-intel commented 2 years ago

Regarding import vs include, I'm guessing we'll need to discuss compilation units and linking to resolve that. The include operation won't impinge on that, but import semantics might. Could we have pre-compiled modules that could be imported? (I may be ignorant of existing user-exposed linkers that allow the combination of pre-compiled linkable P4 modules -- I've been assuming that doesn't generally exist.)

If we decide that the compilation process will always be "monolithic" -- that is, you can consider any compile operation as being fed the result of a single file which results from the processing of include and import as we've been discussing -- then maybe you could replace include (with the loss of some macro substitution functionality depending on how alias gets exposed).

jfingerh commented 2 years ago

I am not sure if I understand Mihai's proposal in enough detail to answer the following question, so I will ask it here.

Suppose we have two tiny modules, M1, and M2, that each contain only a single definition:

// Contents of module M1
const bit<8> A = 1;
// Contents of module M2
const bit<8> A = 2;

Can I write a third module M3 like this?

// Contents of module M3
const bit<8> B = X.A;

And if I can, does the meaning of X.A depend upon previous import statements in a top level module where all of M1, M2, and M3 are imported, as shown by the examples below?

// top level module T1
import M1 as X;
import M2 as Y;
import M3 as Z;
// Is Z.B equal to 1 here?
// top level module T1
import M1 as Y;
import M2 as X;
import M3 as Z;
// Is Z.B equal to 2 here?

If the answers to the questions in comments above is "no", why is it "no"?

If the answers to the question is "yes", then that seems to me like a disadvantage of this approach, i.e. that the meaning of the code in module M3 depends upon code outside of its definition, i.e. on what other import statements and definitions occurred before it was imported by T1 or T2.

det-intel commented 2 years ago

Quoting Andy:

Suppose we have two tiny modules, M1, and M2, that each contain only a single definition:

// Contents of module M1
const bit<8> A = 1;
// Contents of module M2
const bit<8> A = 2;

Can I write a third module M3 like this?

// Contents of module M3
const bit<8> B = X.A;

This is a great example. I hadn't realized it, but it's why I was arguing that the importing module should be responsible for the namespace names and scope should be determined strictly by the contents of a given file.

One way to look at this is that the module M3 cannot be compiled independently to the point of resolving objects to the names it uses. It references X.A but that is "external" to M3. This is why I think the question relates to compilation units and external linkage conventions.

It is also the "closure" idea I mentioned in the meeting, but spread across files making it even more clear. The reference to X.A in M3 requires "looking outside" the scope of M3. I think that should not be allowed. This has significant impacts on language semantics that I only have an intuitive sense of. Maybe Nate and Mihai can characterize it better and we can discuss whether it curtails the usefulness of P4.

Just to illustrate what I was referring to as "closure", I mean being allowed to reference identifiers in a "super" namespace like Mihai's original:

const bit<4> $0 = 1;
namespace {
    alias $0 X;
}

I eventually saw this as an implementation approach for how to resolve identifiers in modules, but I would be troubled by the broader implications of the semantics for exactly the reasons Andy has pointed out.

jfingerh commented 2 years ago

I suppose another plain-language description that Dan and I are hoping for (I do not know the language semantics formal term for it) is that the meaning of a module M can be determined solely from the contents of the file(s) that define M, plus any modules that are imported by M.

If a module X imports M, the meaning of M should be independent of any other import statements in X, and independent of any other code in X at all.

mihaibudiu commented 2 years ago

Not necessarily, previous declarations in X may influence the meaning of declarations in M. But if you import a module using a new name then the meaning should indeed be independent. You can distinguish between "closed" and "open" modules. A closed module has no references to identifiers undefined in the module. The "meaning" of such a module cannot be changed no matter how you import it. But even such a module could lead to an uncompilable program if importing it causes some declarations to be duplicated.

mihaibudiu commented 2 years ago

See also https://en.wikipedia.org/wiki/Free_variables_and_bound_variables. We can formally define what it means for an identifier to be free or bound in a module. But there is no reason to disallow modules with "free" identifiers, they could be useful.

jfingerh commented 2 years ago

I agree that a module definition with free identifiers could be defined to have a clear, well-defined meaning. I agree it might be useful.

Part of language design is sometimes saying "no" to things that might be useful, though, and I am wondering if this might be such a case. In this case the extra power it gives also appears to be a source of confusing behavior for those not wanting this extra power.

If others discussing this proposal would like that extra power, could we at least consider the idea of the default meaning of a module is that free identifiers cause a compile-time error, but there is a syntax to "opt in" for allowing free identifiers in a module's definition?