twinbasic / lang-design

Language Design for twinBASIC
MIT License
11 stars 1 forks source link

Discussion: Namespaces #21

Open mansellan opened 3 years ago

mansellan commented 3 years ago

Is your feature request related to a problem? In VB6 / VBA, all referenced types are imported into a single global namespace. Collisions can be resolved be prefixing the library name (Excel.Range vs Word.Range), and methods in standard modules can be disambiguated using the module name. But that's as much as you get. This leads to difficulty in using common words for types, and huge intellisense lists.

Describe the solution you'd like Ideally, namespaces similarly to VB.Net, but it's... complicated. The current behaviour is deeply ingrained, part of the DNA of the language. All the standard library functions, and any referenced library, are not going to be namespace-aware not matter what happens with twinBASIC. At the same time, if the hope is to evolve the language such that greenfield projects can be written more expressively than was possible in the past, true namespace support would be a huge win. Naming things is hard, doubly so if you only have one namespace.

This issue is very much intended as a discussion, I'm really not sure whether a solution exists. I don't believe this is an urgent problem, VB6 managed without namespaces. It is something to consider for the future.

Some things that a candidate solution should consider:

  1. It should not require additional cognitive load when using or writing in the VB6 style (reference a lib, use its types).
  2. It must be compatible with COM and DLL exports (see twinbasic/twinbasic#22)
  3. What impact would any proposed solution have on type resolution? What happens if a namespace level collides, e.g. MyApp.VBA?
  4. What level should it act upon? Component/File/Project?

Describe alternatives you've considered Some have taken to simulating namespaces using predeclared classes and/or type hierarchies, but this is an imperfect compromise and adds bloat to a codebase.

Additional context An initial proposal, I'm sure there are problems with it:

  1. Add an Option Namespace - without this a component uses existing behaviour (opt-in). If set:
  2. Library functions are NOT automatically imported (either from VBA or any referenced libs)
  3. Namespaces can be defined: Namespace MyApp.Models.Shipping
  4. Namespaces can be imported: Import VBA, Import MSForms, MyApp.Models.Shipping
  5. Namespaces can be aliased: Import MSForms As Forms

Not sure what this would mean for exported types, although in other issues the use of attributes is being considered (see twinbasic/twinbasic#22). I think this might help with the issue of existing libraries though - I could envisage writing a "wrapper library" for e.g. MSForms that exposed its types in a series of smaller namespaces.

WaynePhillipsEA commented 3 years ago

I think this is a good long-term goal. FWIW, the twinBASIC compiler/resolver is actually already namespace-aware, but just has the one implicit 'global' unnamed namespace defined.

WindowStations commented 3 years ago

https://www.youtube.com/watch?v=N9QAgIpwS5A The video I posted above shows vb.net namespaces consumed by VB6 as nested class buckets (a term used by MS before NET). See the last minute of the video for namespaces.

Greedquest commented 3 years ago

An initial proposal, I'm sure there are problems with it:

  1. Add an Option Namespace - without this a component uses existing behaviour (opt-in). If set:
  2. Library functions are NOT automatically imported (either from VBA or any referenced libs)
  3. Namespaces can be defined: Namespace MyApp.Models.Shipping
  4. Namespaces can be imported: Import VBA, Import MSForms, MyApp.Models.Shipping
  5. Namespaces can be aliased: Import MSForms As Forms

Not sure what this would mean for exported types, although in other issues the use of attributes is being considered (see twinbasic/twinbasic#22). I think this might help with the issue of existing libraries though - I could envisage writing a "wrapper library" for e.g. MSForms that exposed its types in a series of smaller namespaces.

I've been having a think about this and in light of the new multi-file support I think I'd like to suggest a slightly different approach.

If a namespace is just a list of names that refer to things that can be accessed (without qualification) from a certain scope, then VB* already has 3 hierarchical levels of namespace1:

(most enclosing)

  1. Project namespace - true global, defines the set of names accessible from all contexts in a project, think Public Subs in a module, class names or names imported from referenced libraries.
  2. Component namespace - within a module or class there are methods and data that may only be accessible to code running inside that component - e.g. private member variables of a class.
  3. Local namespace - names accessible to code inside a function like locally Dimmed variables

(most enclosed)

Let me explain a little more:

Scopes and Namespaces in VB*

When I say that namespaces "enclose" one another/can be "enclosed", I mean, for example, the Component namespace - list of names defined for code in that component - is a superset of the enclosing Project namespace, so all code defined in a component can use names listed in both the Component and Project namespaces. Or in other words, the component namespace of MyClass effectively contains all the methods, variables, types of MyClass as well as VBA.CLngPtr or Collection or global variables etc., but not local variables inside one of MyClass' methods.

It's also worth mentioning that methods of a Class with the Public access modifier are not in the Project namespace even if that class is; e.g. I can Dim x As MyClass in any location in my project, so MyClass is in the Project namespace, however the Foo method in a call likex.Foo is qualified by "x.", so Foo is not in the project namespace (even if it is declared with a project-wide scope so can be called from anywhere). Contrast this to if Foo was declared with the Public access modifier in a standard module MyModule. Now both MyModule and Foo are in the Project namespace, since MyModule.Foo and Foo on its own are both valid. In this way the Public access modifier has 2 jobs:

  1. In a class or standard module, it declares a name with project-wide scope
  2. Only in a standard module, Public Sub Foo also exports the name Foo to the enclosing project namespace whereas for a class it does not.

In general in VB*,

Add reference will dump the exported names from the referenced lib into project namespace with shadowed member resolution.

1There is a "0th" namespace, maybe call it the Exposed namespace which in VBA is populated with PublicNotCreatable classes, predeclared classes and public module members without Option Private Module, basically the stuff visible to the outside world. I'm gonna ignore it for now


Ok, definitions aside, what I would propose for twinBASIC is a 4th namespace in between the Project and Component namespaces; a new File namespace.

What I mean by this is that rather than declaring Modules and Class names in the project namespace by default, and rather than exporting Public Subs defined in a module to the project namespace, all the information in a .twin file goes into an intermediate File namespace.

So before in VB* the following code:

'''scratchpad.bas
Private Const MODULE_NAME As String = "scratchpad.bas" 'component scope, visible to all members of this module

Private Function CallMe() 'this method is component scope too, as other methods in the "scratchpad.bas" module can see it
    Static callCount As Long 'Declared with Local Scope, not visible outside this function
    Debug.Print MODULE_NAME; "was called - tot calls ="; callCount
    callCount += 1
End Function

Public Sub RunCode() 'the Public keyword *exports* something from component to file/project namespace
    CallMe 'CallMe is private sure, but this local scope can access enclosing component namespace as well as local namespace
End Sub

'''MyClass.cls
Public Sub Foo()
    ...
End Sub

... would define the following additions to/new namespaces:

ProjectNamespace(ExampleProject) += [scratchpad, RunCode, MyClass] 'and by extension scratchpad.RunCode & MyClass.Foo
    ComponentNamespace(scratchpad.bas) = [MODULE_NAME, CallMe, RunCode]
        LocalNamespace(Sub CallMe) = [callCount]
        LocalNamespace(Sub RunCode) = []

    ComponentNamespace(MyClass.cls) = [Foo]
        LocalNamespace(Sub Foo) = [...]

    '...

note we're adding to ProjectNamespace, polluting it


tB meanwhile would have

'file1.twin
Module scratchpad
      Private Const MODULE_NAME As String = "scratchpad.bas"

      Private Function CallMe()
          Static callCount As Long
          Debug.Print MODULE_NAME; "was called - tot calls ="; callCount
          callCount += 1
      End Function

      Public Sub RunCode()
          CallMe
      End Sub
End Module

'''file2.twin
'file scope - python can have code here but VB* can't, modules suffice
Module scratchpad 'no name clash as this is file2's file namespace
    'component scope - in VB* we can't write code that executes here I don't think
    Sub test()
        'local scope - VB* always runs code in local scope, but it can access members of enclosing scopes
        Debug.Print CLngPtr(12) 'fine, VBA lib added in project references, meaning it is in project namespace
                                'and local scope can access project namespace
        RunCode 'Error, name not defined, parallel file1.twin file namespace inaccessible from here
    End Sub
End Module

... which defines:

ProjectNamespace(ExampleProject) += [] 'still populated with VBA.Collection, Scripting, Excel etc, but no user code... (yet)
    FileNamespace(file1.twin) = [scratchpad, RunCode] 'this would have gone into true global ProjectNamespace 
        ComponentNamespace(Module scratchpad) = [MODULE_NAME, CallMe, RunCode] 'below as before
            LocalNamespace(Sub CallMe) = [callCount]
            LocalNamespace(Sub RunCode) = []

    FileNamespace(file2.twin) = [scratchpad, Test] 'each file has its own FileNamespace so name collisions in user code are not an issue
        ComponentNamespace(Module scratchpad) = [Foo]
            LocalNamespace(Sub Test) = [] 'but again, although LocalNamespace = [], code running in local scope has access
                                          ' to the sum of all enclosing namespaces, i.e.:
                                          ' AccessibleNamespace = ProjectNamespace + FileNamespace + ComponentNamespace + LocalNamespace

Each file has its own namespace which is not visible to the other file, meaning Public Sub RunCode in file1 is not project-wide and so not accessible from file2 - similarly classes would not be instantiable from other .twin files in a project. To get around this there would be 2 options:

  1. .bas and .cls files would be supported which allow for the existing behaviour by being added directly to the Project namespace. Combined with the fact that project references are in the project namespace, this would be identical to existing behaviour.
  2. For .twin files meanwhile, a new import statement can explicitly bring names from another file's File namespace into any enclosing namespace2
'''file1.twin
from file2 import Foo 'into file namespace

Module m
   from file2 import Bar 'into component namespace
   Sub t()
       import file2.Baz 'into local namespace
       'sees Baz, Bar & Foo`
   End Sub
   Function f()
       'can see Bar & Foo but not Baz
   End Function
End Module

Class c
    Sub x()
        'can see Foo in file namespace but not Bar or Baz
    End Sub
End Class

This behaviour is very similar to python (except python lets you run code in any scope and can have functions within functions so is a bit more complicated).

To start with import file2 can put everything from file2's file namespace (or exposed namespace?) into file1's file namespace

2 Well not the full file namespace, I think Public/Private on a component could dictate whether it is importable by other files in the project. This would hopefully extend to other projects and allow import from other tB code, compiled dlls are another issue


Summary

It think as @mansellan says:

The current behaviour is deeply ingrained, part of the DNA of the language

and because of that I respectfully disagree with the suggested proposal:

  1. Library functions are NOT automatically imported (either from VBA or any referenced libs)

I think the project references should work as they always have by bringing the exposed members into the project namespace where they are accessible unqualified from all scopes using shadowed member resolution, I think this is unavoidable for backwards compatibility "the DNA". On top of this, users should be able to add [Legacy]StandardModule and [Legacy]ClassModule files to their project (they could appear as .bas & .cls in the virtual filesystem although based on how you handle file metadata like attributes maybe you want a different file extension - see twinbasic/twinbasic#162). These Legacy components also populate the project namespace like in VB*.

Consequently I also disagree with:

  1. Add an Option Namespace - without this a component uses existing behaviour (opt-in).

... as this will be accounted for by supporting legacy files for the standard behaviour, and:

  1. Namespaces can be defined: Namespace MyApp.Models.Shipping

Namespace should not be a keyword - I think the file-level boundary is sufficient to delineate where these new namespaces fit it. Namespaces will be automatically named based on filename and directory structure, e.g.

/Sources/Models/Shipping.twin 'in MyApp

could define the namespace MyApp.Models.Shipping, along with relative imports like if 2 twin files are adjacent in the directory structure, they don't then need the MyApp.Models.

As for the solution criteria:

  1. It should not require additional cognitive load when using or writing in the VB6 style (reference a lib, use its types).

It could in fact be identical

  1. What impact would any proposed solution have on type resolution? What happens if a namespace level collides, e.g. MyApp.VBA?

Existing namespace resolution goes Local > Component > Project > References in order - with the exception of language keywords - so I think just slip File before Project and all is well.

  1. What level should it act upon? Component/File/Project?

Namespaces already exist, but the new one would act at a file level

  1. It must be compatible with COM and DLL exports (see twinbasic/twinbasic#22)

The Project namespace is already not the same thing as what other projects would see - an Exposed namespace. That does not include all the stuff imported into the project namespace by adding references, or things like Option Private Module etc.

Would require a separate issue to discuss probably. One drawback with this approach is that it does not allow you to use namespaces directly with referenced dlls since the add reference behaviour is project-wide (so no Import VBA, Import MSForms). You could write a thin tB wrapper to namespace-ify a reference by creating a number of .twin files which each exposes some subset of the reference's functionality.

However compiling tB code to COM compatible would nuke the structure (though not break anything, since the compiler is aware of what x might be referring to), unless there's a way in COM to support namespaces - maybe it's just the technology. The workaround is not compiling to COM dll - if you could reference other tB source code like an uncompiled/half-compiled twinproj. In one possible implementation you could basically simulate having one project's source in an adjacent directory in the virtual filesystem. VBA doesn't need to name its namespaces because they are all implicitly exported to one another rather than implicitly imported by one another, so encapsulation is achieved through access modifiers.

TL;DR


Hope that's clear, let me know if I can clarify bits, much inspired by Python's approach

mansellan commented 3 years ago

@Greedquest most insightful. I need to re-read it a few times to make sure I follow everything, and should probably read up on what Python does too. I'm very heavily .Net biased at the moment, and especially C#.

From what I understand though, it does seem like you might be conflating namespaces with scopes. The hierarchy you describe is what I understand to be scopes - the symbols that are available at a particular level. That's not quite the same as a namespace. IIUC, VBx doesn't really understand namespaces, but it does have scoping rules. If I reference a library, it imports to the global (only) namespace. If there are naming clashes, I have limited ways of fixing it to get the code to compile - I can prefix the library name, or with user code I can prefix the module name if it's a standard module.

Adding an additional scoping level would certainly help to avoid clashes, but it seems like a partial solution. In .Net, namespaces can be chosen explicitly [1], allowing ultimate freedom. For example, at work I have some libraries which import into the System.Linq namespace. This is a deliberate (some would say reckless!) choice, which means that as soon as someone references the library, any time they use the (built-in) System.Linq namespace, they also get all of the overloads I added to it. That level of flexibility doesn't seem possible with your proposal?

[1] In practice, they're often not quite so carefully chosen. The default behaviour for Visual Studio when adding a new class is to put it into a namespace based on it's location. Many developers don't bother to think about changing it.

WindowStations commented 3 years ago

Mansellan, I ripped a bunch of net class clones with reflection and put them into namespaces on the NET side, making them comvisible, but I don't seem to have any real bloat like NET. Although it's fair to say that all the enums/classes can be seen and instanced, I don't think that's the real issue with vba/vb6 namespace awareness. I like having the extra features available in intellisense, but the class buckets don't actually expose anything outside of the hierarchy. The problem that I see is that NET classes can be used as an instance, but sometimes used directly (imported) like this: Process.Start. Instead of the sample, https://bytecomb.com/organizing-code-with-namespaces-in-vba/ , use a creatable class in VB6 and simply change the module declaration to an empty property returning the class or New Class as needed.

Change this: Public IO As New nsIO

To this: Public Property Get IO() As nsIO End Property

That exposes the same NET functionality to VB6 depending on the constructors used on the NET side.

bclothier commented 3 years ago

FWIW, any discussion around scoping, of which the namespace is a part of, must be in compliance with the MS-OAUT and MS-VBAL specifications at least for publicly exposed entities.

For references:

MS-OAUT: 2.2.49.2 IDL Automation Scope MS-OAUT: 2.2.49.10 Referencing External Types MS-VBAL: 4 VBA Program Organization MS-VBAL: 5.6.10 Simple Name Expressions

For me personally, I look at namespaces as a logical containers. It need not to correspond to the filesystem structure, though it is common to do so. I'm a bit ambivalent about using C-style includes, of which the Python sample above strikes me as being similar. One reason for that is because this hinders the ability to refactor the code. If the namespace is not a code element, then it's hard to refactor without breaking the code. Consider that refactoring could be:

1) moving a member from one type or module to another type/module 2) moving a type's place in the namespace (e.g. add a new subfolder) 3) moving a type into a new library altogether

And a good refactor would take in account of all the call sites & usages of the moved type so that the code is not left in a broken state. If not making namespace a code element makes it hard to achieve, then I would be disinclined towards any solutions that doesn't make namespace a code element.

Though the goal is full compatibility, we need to recognize that how the type library is designed, there's a perverse incentive for making huge type libraries huge & bloated. One huge factor is the reluctance to add several references, and in VBA projects, the reference's full path are used which also causes problem for caching & porting to other computers. If the namespaces are to be visible to non-tB consumers, then it needs to somehow solve that problem. One possible route would be via use of importlib, of which Access object library does a fair number, but still requires that the VBA project add references for the imported type libraries to get early-binding for those imported types. Another route is to embed multiple type libraries into the same output file. The downside is that you'd have to use syntax like ...\foo\bar\myTBProject.dll\3 -- what is 3? How is the 3 different from the 2? Or the default one for that matter?

For that reason, there's a lot of motivation toward dumping all the types into one giant type library as to avoid having to reference each dependencies. The ideal solution should help avoid that from causing problems especially when we raise the level of abstraction in a tB project.

If we choose to make namespace a logical organization units, for use within tB, independent from the type libraries that is generated, then we have considerably more freedom in defining how it should work. The downside is that the author then has to maintain 2 parallel organizations; the type library representation and the namespace organization.

I can see nested class buckets working as long we keep to the convention that the bucket is not creatable & cannot be referenced. However, that requires it to be a module, rather than a class and as far as I know, a module cannot contain another module, so that limits the nesting to only one level. Furthermore, it would feel weird to see a "class" in the object browser that is nothing like an actual class.

If we cannot avoid the limitations in the scoping for the type libraries as defined in the MS-OAUT & MS-VBAL, my vote would be to make the namespace a tB-only construct for aid in logical organization. We should easily control the type library's shape via use of attributes. The extra responsibility of maintaining 2 different scoping hierarchies is an acceptable trade-off for me. That is because the feature of being able to control what is in the global namespace and how to disambiguate the types is more important to me especially for private implementation details.

WindowStations commented 3 years ago

In a larger namespace setup, you can't have multiple references without adding a fair amount of difficulty. Many classes rely on other classes already which makes it difficult to model/manage both for the developer and end-developer. But, remember that you MUST also integrate these class hierarchies with the UI Control set you're designing for Windows. These things work together and will be designed together when they overlap. Otherwise you need extra code to handle missing references etc.

Also, note the property I posted above can be declared as private in individual standard/class modules. Then, other standard/class modules won't see IO.

Would TB be able to offer the same range of quantity/quality content available through com-visible NET namespaces? Would TB's namespaces be built by scratch, or are they exposed in the model?

WindowStations commented 3 years ago

Global naming collisions don't really happen if you explicitly instance the classes. Dim p1 As New VBCtl.Process' Namespace class from comvisible NET dll Dim p2 As New Project1.Process 'Project class .cls Dim p3 As New RefencencedLib.Process 'referenced class dll These are different instances of classes with the same name. How is this a limitation again?

bclothier commented 2 years ago

Just to take the discussion from #634 and put it where it belongs. I think that is relevant here because this touches on shadowing which overlaps the problem domain that the namespaces solves.

There seem to be a consensus that we should have the ability to:

1) Unreference the default VBA (and other related libraries) altogether 2) Put some other library as higher precedence over the VBA.

Allowing either would then allow anybody to effectively create their own standard library and even shadow the common functions with their custom implementation.

In VBx, it is not possible to write a project that shadows the existing functions and use it in other project. One consequence of that is that it encourages copy pasta, especially in VBA which then get exacerbated when the copied code is then altered, so we now have 10 different versions of shadowing functions. We want to make it easy to publish a library and use it in other projects. The current VBx way adds the reference to the global namespace and use precedence to resolve ambiguity. In a complex project, that can result in a very polluted intellisense listing all unrelated functions that are not relevant to what you are coding at the moment. appobjects are other big headaches; adding references to type libraries that contains appobjects that has similar methods can cause problem especially if the access is unqualified. This is a real problem that need a solution.

We need to consider whether shadowing is a good idea, especially for common standard functions. I have less problem with providing a custom implementation of an interface than I do with a global function because the latter violates the principle of least astonishment. It would make for a frustrating experience if I was working in someone else's project and it was not clear to me that the StrConv was shadowed by a custom implementation, especially because VBx does not have using or import statement. Moreover, it really does not address the separate problem of the global namespace pollution.

I'm intended to think that hierarchical namespaces is still a good idea with or without the ability to shadow the standard libraries. Just to look at what other languages has done, they do similar things. Python has Import. So does Rust using the mod syntax. Looking at the rust's example, I could see doing something like this:

Public Module MyBigGnarlyStringLibrary
  Public Module Unicode
     ...
  End Module

  Public Module ANSI
    ...
  End Module
End Module

The problem is that it would not be COM-compatible because modules cannot be nested the way we can with interfaces. It might be acceptable for use within tB coding, but if we allow that, then we need to then deal with how to export them in the published type library.

Greedquest commented 1 year ago

We need to move this forward, as starting from twinBASIC v1, I believe the consensus is that internal backwards compatibility - that is, backwards compatibility not just with VBA/VB6 but also with previous versions of twinBASIC - MUST be maintained. That means no more breaking changes and this will probably be one. So it needs to be resolved while still in Beta. Even if that's not the case, we should make a decision sooner rather than later, as I think namespaces will impact almost all tB code, and the longer we wait the more disruptive any change will be, both to the ecosystem of tB code and the compiler/IDE.

Quick catchup of the discussion so far:


In my opinion either of the philosophies; a namespace block like in C# or VB.Net or typescript, or using the filesystem as the namespace structure like python or go or Rust - both of these approaches will be able to support the same kind of functionality. The complexity for VBx is blending well with COM.

A tB project has as many as 4(/5) views on its structure:

  1. Standard DLL - We know this is a flat list of all the [DllExport] functions & methods - e.g. kernel32.dll
  2. COM DLL - Basically flat collection of COM Classes and Interfaces (+ AppObject functions)) - e.g. VBA7.dll, EXCEL.EXE
  3. Filesystem structure - What you would see if you loaded this project on github and tried to navigate it
  4. twinpack interface - From the perspective of a twinproj project referencing you as a twinpack. This interface to the outside world will become increasingly important as the tB ecosystem grows.
  5. Finally; underlying twinproj code structure - Structure (including modules, classes, interfaces, standard methods, namespaces) as you build up your codebase, what you would see in intellisense. This is what we define when we code, the others are just representations of this.

Ideally all 5 of these structures would be identical. However in practice we need to define mappings between the first 4 and the last one - between the "views" of the project and the project itself.

Because of this complexity, my proposal is that we match the filesystem structure to the code structure as much as possible. Practically what that means is namespaces are defined by .twin files, not standalone blocks. I've looked at a few big C# projects e.g. Rubberduck - it seems very rare to find a place where 2 namespaces are defined in the same .cs file. And pretty much every .cs file has a namespace. So there is essentially a 1 to 1 mapping between namespaces and files 9I think the only exception is when you want to split a namespace over several files, but this isn't done too often). RD3 has just undergone an effort to match its namespace structure to its filesystem structure. This all leads me to feel it would be better to just use the paths of .twin files relative to the project root to set their namespsaces.

One big reason is tB is unlikely to get support in github vscode any time soon. Therefore even if tB provides great tools to navigate the namespace hierarchy inside the IDE, navigating in github will rely on the filesystem structure. RD '@Folder annotations make me feel that people want to structure their projects in this way.

That is the filesystem view; 1:1 mapping with the code structure. What about twinpack view, standard DLL view or COM DLL view.

I think Standard DLL View, namespaces have no impact; if the same method name exists in 2 namespaces, and both have [DllExport] then there should be an error. It would be really cool if tB could actually show you the dll view in a dummy readonly .bas or .twin file

Module [DLLExports]
    Sub Foo()
    Sub Bar()
End Module

You could use a naming schema Namespace1_Namespace2_Foo, Namespace1_Bar to circumvent name collisions or even for type overloads.

In COM DLL view, I don't think you should auto-generate namespace buckets, since they are not a perfect representation. You could optionally allow but not assume it happens. Therefore if 2 namespaces define the same class name both marked with [COMExport] or [Exposed] then that should be an error. Again you can use a dummy readonly .twin file to show the complete COM DLL View


Enum MyEnum
    tbYellow
    tbRed
End Enum

Class Class1
    Implements IFoo
    Sub Whatever()
End Class

Interface IFoo
    '...
End Interface

where Ctrl+F2 would take you to the actual resolved definition of that Class or interface. Flatten the namespaces but people can obviously still make the class buckets by hand if they desire. But don't try and force the two to be the same thing.

The twinpack view is most complicated. You need a way to expose namespaces to code that references your project. In general, just add projName. to all the namespaces in your project. However sometimes you may define a twinpack that maybe adds extra methods to the VBA namespace or linq to the iterable types. So I think a special folder (not Source, maybe Overloads need to be created where the namespaces do not get the projName. prepended. So $TWINROOT/Source/VBA.twin gets exported as myProj.VBA from the perspective of a someone referencing a twinpack, but $TWINROOT/Overload/VBA.twin gets exported and merged into the VBA namespace directly. Both would be visible as plain old VBA from the perspective of code inside the project (view 5).

As mentioned in https://github.com/twinbasic/lang-design/issues/48#issuecomment-1220451798 I think the Public/Friend/Private access modifiers when applied to Modules, Classes & Interfaces should dictate whether they are visible outside that namespace. [COMExport] or some variant should dictate COM DLL visibility.

Greedquest commented 1 year ago

In my opinion either of the philosophies; a namespace block like in C# or VB.Net or typescript, or using the filesystem as the namespace structure like python or go or Rust - both of these approaches will be able to support the same kind of functionality. The complexity for VBx is blending well with COM.

Actually rust does both, we could just do that: every file defines an implicit module namespace, but there is also an explicit namespace block you can use to define a namespace in code.

mansellan commented 1 year ago

Thanks for bumping this, and for the extensive analysis.

I have to admit, I hadn't understood all of the competing hierarchies involved until you listed them above. To be honest, I haven't been close enough to VBx or tB recently to keep up.

However, I know the friction when it comes to my daily language (C#). Microsoft made the choice to "encourage" people to use the filesystem hierarchy to determine namespace hierarchy. When you create a new class, it's templated to add any folders. Resharper also likes to warn you if you deviate from this arrangement. But, IMO, it's based on an incorrect assumption.

Filesystem layout is a choice of the developer - how would they like to see their files on disk. Namespace layout is primarily about discoverability. These are separate concerns.

Users of a library want to know which functionality is related to their problem domain. If I (as a user) type System.Web. , I expect Intellisense to pop up suggestions for the high-level API for web use-cases. I really don't care if the developer happened to organise the source files differently. That's irrelevant to me.,

So I think it's critical that any solution must at a minimum allow the namespace to be explicitly defined with a Begin Namespace ... End Namespace block. Perhaps there should be a default convention when namespaces are not explicitly set, as you say.

As for how this interacts with COM et all, I can't comment. I guess that worst-case scenario is that namespaces are only available from twinBASIC, and everything gets "flattened" into a global namespace for COM. But as you point out, there may be a middle road there.

mansellan commented 1 year ago

Another aspect is: how do you include namespaces?

There's no point to namespaces if everything is in scope all the time. You choose which namespaces are in scope per-file, and that determines which types can be resolved.

I can only speak to a couple of examples:

C# requires that you import each namespace explicitly:

using System;
using System.Text;
using System.Text.Json;

(recent versions also allow you to set global imports to all files, cutting down the noise).

VB.Net acts similarly:

Imports System
Imports System.Text
Imports System.Text.Json

(AFAICT, there's no Global Imports feature, I guess because MS have pretty much given up on VB.Net now)

Java allows wildcards, which is quite nice:

import java.util.*;

(but again, no global imports as far as I can tell)

mansellan commented 1 year ago

If I had to vote, I'd vote for:

For COM, maybe the namespace has to prefix the type name?

Some_Name_Space_Document

Maybe with a means to override the COM typename? I would imagine that twinBASIC libraries that are intended to tB use will be largely separate from those that are intended for COM use, and that an attribute could help smooth the edge cases

mansellan commented 1 year ago

I would say though, this doesn't need to be solved right now. It can wait till v2.

C# didn't get generics till v2, and that was fundamental to its type system. Many of the primitives we live with today wouldn't have ever existed if generics were in v1 (IEnumerable, I'm looking at you...)

Languages don't have to break on paradigm shifts. They just get a little crusty if they don't want to ever break back-compat.

bclothier commented 1 year ago

I should point out that the bucket idea earlier mentioned more closely approximate the modules in JavaScript. The namespace is still "global" in both JS and VBx languages but you hide them by stuffing them in a module in JavaScript or in a bucket. That avoids having to define what namespaces are to be imported.

I'm not 100% sold on the globbing namespaces. The temptation is just too great to just glob everything because the alternative is just too painful. The recent versions of VS helps make it easier to add the missing using which was closer to the sweet spot and would not need any globbing. The new global imports to me feels like a step backward ("why not just glob everything?!?"). There's also the fact that explicit import makes it impossible to copy'n'paste a snippet of code; on SO you will see numerous C# code snippets that you have to then figure where namespace/reference you need to import to get it to compile. We don't want that for tB.

As mansellan correctly points out, the organization of the source files is of no interest to the consumers. For that reason, I'm inclined toward any solution that will work in COM and that's the bucket idea. The biggest problem I have with it is the same one I have with JavaScript's modules - there's that "bolted-on" feeling which just increase the boilerplate and therefore the distance between "what is this doing" and "how is it doing that", which is a very bad thing™ for debugging and troubleshooting.

Just as a completely different alternative: We could have a directive to enforce the reference order:

one.twin:

Prefer DAO
Prefer ADODB

...

Dim rs As Recordset 'This is a DAO.Recordset

two.twin:

Prefer ADODB
Prefer DAO

...

Dim rs As Recordset 'This is an ADODB.Recordset

However, my problem with that idea is that it just enables the user to use implicit references which runs contrary to the advice of always disambiguating your references. I don't think I want to enable them to be sloppy. This still has the same problem of forcing the user to guess at what library it is when copying a snippet of the code.

I would rather have an inspection warning about ambiguous references and CodeLens feature to show the resolved reference when reading the code. I think that is the cleanest and simplest way to address this issue in a backward-compatible fashion from the consumer's POV.

That then leaves us with the intellisense list and exploring the library. If we look at MSXML2 library as an archetype, we can see that there's a strong incentive to just dump all kind of semi-related stuff into one giant library because it's a huge pain to split them across different libraries. Consequently, that library does way too much -- you'd think it'd deal with XML but it also deals with DOM, XMLHTTP requests, SAX reader/writer and possibly more stuff. As the consequence, it's hard to really explore the library and discover the relationships between the objects in that library. The twinpack package manager should help lower the barrier somehow by making it easier to import in the dependent libraries as needed, so we do not need to be concerned about shipping specialized libraries, but that will not help the COM consumers who probably want to avoid having to deal with multiple libraries, especially if they will be required to register them.

The bucket idea will help consumers but we need to keep in mind that from the POV of those consumers' implementation, they are now forced to traverse a bunch of COM objects and that could potentially drain the performance. tB consumers probably won't have that problem because being source code, the compiler has the opportunity to optimize the calls but once compiled, that won't work for non-tB consumers, I think.

mansellan commented 1 year ago

If I understand correctly, in VBx and by extension tB, type resolution ambiguities are resolved through reference priority. This is complicated by the fact that in VBx, some references (and their order) are mandatory. We should reject that inflexibility.

I've pretty much accepted my role as "the .Net guy". I know that this community rightly has concerns about re-inventing Visual Fred, so let me assure everyone that I don't want anything like that. But .Net has been on a 20 year journey, it's made some mistakes, and it's made some discoveries. There's no point in us repeating their early mistakes.

.Net has been through a reinvention. Initially called .Net Core, and now just .Net, it's a different beast today. Almost everything got rebuilt, including the project system.

I've yet to see much downside to the new .Net project file format. It's still XML based (not great), but it's strictly based on inheritance. In the past, .proj files had to specify absolutely everything. They don't any more - you only have to specify those things that are unusual - everything else uses very sensible defaults. That's cut most .proj files down from hundreds of lines to a handful. It makes it so much easier to reason about.

Why does this matter to twinBASIC?

twinBASIC has a binary project file at the moment. It's opaque to developers, unless they fire up the project properties dialog. I know that this ties into the virtual filesystem from VSCode, but is that still relevant?

I suspect that within the binary stream, the project layout is JSON. Why not expose that directly on disk? There could be a templating system, such that the default for a DLL is x, the default for a standard exe is y, but all templates can be overridden per-property by the per-project needs, in human-readable JSON.

One aspect of that could be to redefine reference priority.

mansellan commented 1 year ago

I'm reserving judgement on globbing. It has benefits and disadvantages. It's just too early to know for sure.

At work, it's still not clear on where the line is for what is, and is not, appropriate for a "global using" or a glob in the proj file.

bclothier commented 1 year ago

I think I like the direction you're going in WRT the project file and reference priority. We could make it so that reference priority is supported as a config within the project file and have the option of upgrading to a new config that allows for more flexibility in namespacing.

The only problem is that while this will benefit the developers working on the organizing the project, we can't necessarily expose that for COM consumers.

Just for the POV - in a C++ project, you typically will have IDL(s) that defines the API that you want to expose, and MIDL compilers will generate stubs, leaving it up to you to implement the C++ classes in any shape you would like. In theory, you can have a single C++ class serve several COM interfaces, even those that don't have any relationships to one other. Now, this is where I'm not too sure, but I think that other C++ consumers will not get to see the internal details, at least not if they directly consume the internal C++ definitions, rather than what is exposed via the type library. Exposing the internal details would mean you could easily violate the COM conventions, and I would imagine it's not necessarily a good idea™ to expose the internal C++ architectural details if you are exposing them as COM objects. If what I've described is correct, we should be concerned the same way with how we consume tB objects.

That boils down to the following options:

Only the third option would avoid bifurcating the API between the COM consumers and tB consumers but given that COM would mandate that each bucket be a class rather than a true namespace entity, it will have ramifications on how it is addressed in an invocation.

Greedquest commented 1 year ago

So much to get into, thanks both for good discussion.

The issue is the class bucket idea is pretty restrictive. I would be happy to see it included as a way to emulate namespaces for the COM Dll build target. However for class buckets to be the way namespaces are defined natively (i.e. in a twinproj or referenced twinpack), as I say, pretty restrictive:

It would be great to see an automatic compilation step to convert namespaces to class buckets where it makes sense. Or a new syntax for creating class buckets more easily. However I don't think they are sufficient to play the role of namespaces in tB. Also given we can already declare then in VBx, it will be hard to do compiler tricks to make them work without breaking back-compatibility (e.g. enforcing explicit import statements).

@bclothier

Only the third option would avoid bifurcating the API between the COM consumers and tB consumers but given that COM would mandate that each bucket be a class rather than a true namespace entity, it will have ramifications on how it is addressed in an invocation.

The COM consumer and the tB consumer will probably have different requirements (e.g. COM consumer is using it as a final product, tB consumer may be inheriting and overloading, modifying behaviour etc.) and so in some ways I'm expecting the API to be richer for a tB consumer than a COM consumer, or indeed a standard DLL consumer. The 5 "views" of a tB project which I described earlier are all different stakeholders so there are some benefits to keeping flexibility in how the code is presented to each of these parties.


@mansellan

If I understand correctly, in VBx and by extension tB, type resolution ambiguities are resolved through reference priority. This is complicated by the fact that in VBx, some references (and their order) are mandatory. We should reject that inflexibility.

Re. COM References like VBA, DAO, ADODB libraries... For backwards consistency I'd like to see these remain in an implicit anonymous globalish namespace. I.e. tB code that consumes these looks no different than VBx. In my mental model a .twin file is at the same hierarchy level as a VBProject - i.e. both can contain Classes Forms and Modules but not other projects. So the ability to define COM references in code in the header section of a .twin file would make sense to me. But you do not need to forcibly qualify references since that would make migrating VBx code to .twin a headache.

Importing from Namespaces I think is a different topic. It should be explicit. You should fully qualify namespace references

I would say though, this doesn't need to be solved right now. It can wait till v2.

The reason I say it needs solving now (v1) is because I am in favour of:

Therefore this will be a breaking change for existing tB code but I think a healthy one.

Greedquest commented 1 year ago

Filesystem layout is a choice of the developer - how would they like to see their files on disk. Namespace layout is primarily about discoverability. These are separate concerns.

Users of a library want to know which functionality is related to their problem domain. If I (as a user) type System.Web. , I expect Intellisense to pop up suggestions for the high-level API for web use-cases. I really don't care if the developer happened to organise the source files differently.

Agreed on the idea that source file hierarchy and namespace hierarchy are designed to appeal to different stakeholders. Both approaches (namespace named after filesystem, fully explicit namespace blocks) have been used successfully in different languages so it's a matter of preference and familiarity. However I think the version of namespaces rust has:

... That version of namespace functionality would be more appropriate for tB than what C# has, even though it is less flexible, because:

How would the Rust approach look? E.g. for this calling code:

'./Consumer.twin
from Foo.Bar import someModule

Module test
    Sub callBaz()
        Debug.Print baz() 'or someModule.baz()
    End Sub
End Module

... You can define the namespace hierarchy in several ways

'./Bar/Foo.twin,

Public Module someModule 'only a Public Module can be imported
    Public Function baz() 
        Return "Hi"
    End Function 
End Module
'./Consumer.twin,

File Bar
    Public Module someModule
        Public Function baz()
            Return "Hi"
        End Function 
    End Module
End File

or even nest two File blocks... not advisable

bclothier commented 1 year ago

The 5 "views" of a tB project which I described earlier are all different stakeholders so there are some benefits to keeping flexibility in how the code is presented to each of these parties.

My concern is that this is already a lot of mental juggling. It's quite frustrating to run into situations where its work OK this way but not that way. If we bifurcate the surface, we are increasing the complexity. Thus, we should make sure whatever solution we come up with is also the simplest and most intuitive possible.

The COM consumer and the tB consumer will probably have different requirements (e.g. COM consumer is using it as a final product, tB consumer may be inheriting and overloading, modifying behaviour etc.) and so in some ways I'm expecting the API to be richer for a tB consumer than a COM consumer, or indeed a standard DLL consumer.

But even so, it is painful for COM consumers as I described using MSXML2 library earlier. A bunch of semi-related objects are dumped into one huge flat list without any information on how they relate to one other, if at all. Why suffer the same fate with a tB DLL built for COM consumer?

You are correct that tB consumers will have more features such as overloading or generics that simply can't be exposed via COM. As you mentioned earlier, we want a solution that is identical as much as possible for all different views. That's the part where I'm not clear how the import idea will help the COM consumers because that forces us to define a COM contract (e.g. how the COM type library gets shaped during building the project) independently because import cannot traverse COM boundaries, only tB-to-tB boundaries, and therefore the bifurcation. It's worth keeping in mind that in VBx world, we don't even think about COM at all. We just throw together some modules, some classes, and presto, it just works™. Forcing COM contract to be defined separately appeals to me but that's only because I'm familiar with it and like being precise. It won't to the rest of users who just want to solve problems and not tinker with the underlying plumbing.

Maybe this is the best thing to do, but we should make sure that is actually the case.

Note: Edited to clarify "COM contract"

Greedquest commented 6 months ago

v1 blocker, depending on outcome.

I.e. If namespaces require either explicit import statement, or explicit qualification, and we decide to do a namespace per file or mandatory/implicit namespaces, then this must land before v1 to retain backwards compatibility between successive tB releases. If however we go for e.g. an optional namespace block, I think this could be added later since it would require explicit opt-in

GaryMiller commented 6 months ago

Wouldn't optional be required for backward compatibility with VB?

fafalone commented 6 months ago

Isn't this more along the lines of forwards compatibility? Which isn't a concern?

VB code is expected to work in tB; but tB code isn't expected to work in VB unless the programmer avoids all the new language features that break it.

tB 1.0 code will be expeceted to work in tB 2.0, but tB 2.0 code won't be expected to work in tB 1.0, unless again, the programmer is taking special care to retain compatibility.

If you are talking about namespaces being implemented in such a way code from VB or tB 1.0 wouldn't run, then yes as GaryMiller notes, it's a nonstarter, can't happen.

Greedquest commented 6 months ago

Wouldn't optional be required for backward compatibility with VB?

No sorry I was worried the way I phrased it would cause confusion.

We care about 2 kinds of backwards compatibility. The first is obviously 100% (or 99.999%) compatibility with both VBA7.1 and VB6 (VB6 has some features not in VBA like the custom controls, and VBA has some features not in VB6 like LongPtr and PtrSafe, tB guarantees forever to be compatible with both)

The 2nd kind of backwards compatibility is internal consistency within successive releases of twinBASIC. Languages like Go and Rust have made commitments never to break existing code that compiled fine in earlier releases. tB also wants to make that commitment starting from v1. This has not been an issue to date as tB is in BETA and that means it can make breaking changes for tB code - as in the tB I wrote that worked yesterday won't necessarily work tomorrow. But from v1 tB will settle down and no longer permit backwards incompatible changes. So anything introduced to date during BETA that is not vanilla VBx must be supported in all future releases of tB and so we need to be careful about what gets "frozen" in v1. It might even be sensible to temporarily disable some BETA features to give ourselves the option to review their design, althogh most of Wayne's additions look pretty watertight. This is also why the language needs stress testing before the point of no return.

We really need names to disambiguate these 2 kinds of compatibility, because while breaking changes to VBA compatibility will never be considered, there are some new tB features which we are allowed to supercede and iterate on in a way that will break existing tB packages. But only while tB is in BETA, for the good of the future of the language.


In the context of namespaces then. None of the suggestions break VBA compatibility, that would be a non starter as we all hopefully agree.

1 suggestion is to make an implicit namespace per .twin file, like python does or rust does per file. .twin files didn't exist before tB therefore we do have the flexibility to decide if we want to do something special with them. If you combine this with making the stuff in namespaces require explicit qualification (MyNamespace.foo not just foo) or require explicit import in order to call stuff in that namespace (import MyNamespace), only under those 2 specific conditions would you break some existing twinBASIC packages. But while we're in BETA, breaking existing features is okay (inconvenient, but necessary for the health of the language).

The existing behaviour of twinBASIC is basically, it doesn't matter where you declared a class or module, it all gets lumped together and everything is globally accessible. This IMO isn't the best way, so there are proposals to add some healthy restrictions that would break existing tB code *but nobody is suggesting breaking working VBA code

Hope that clarifies what I meant, that this could be a v1 blocker or it might not be, depending on precisely which approach for namespaces we choose for the language. But all approaches are VBA compatible don't worry.

Greedquest commented 6 months ago

Isn't this more along the lines of forwards compatibility? Which isn't a concern?

This is about breaking the current behaviour of twinBASIC, but only the non-vanilla new additions. Like being able to declare Modules and Classes with Module and Class blocks in a .twin file, instead of separate .cls and .bas files. That ability is new and still in BETA. But after v1 we'll be stuck with the current behaviour, which may or may not be what we want, and would certainly rule out some of the options above.

WaynePhillipsEA commented 6 months ago

I really don't like the idea of introducing implicit file-based namespaces.

We here as experienced programmers might understand namespaces, but I would consider the general concept a significant extra "barrier to entry" for newcomers (beginner programmers that might be starting out in BASIC, or just newcomers coming over from classic VBx).

I think namespaces will no doubt be a language feature in tB, but it will be optional and very much an opt-in feature for those that want to segregate their codebases in that way.

fafalone commented 6 months ago

I don't think this could be a v1 blocker; how would it implicate tB-tB backwards compatibility but not tB-VB, when VB lacks namespaces?

tB should definitely stay backwards compatible with tB after 1.0, but since there's no namespaces now I'm not sure how they could be implemented in the future that break tB-tB but not tB-VB.

WaynePhillipsEA commented 6 months ago

The suggestion was that the filename (or path) of a Twin file (not Bas or Cls) would automatically place the contained components into a corresponding namespace, based on the file path, hence that would break earlier written code.

wqweto commented 6 months ago

Btw, do we have in TB an attribute for modules that require public procedures/props/enums/UDTs/etc. from a standard module to be explicitly prefixed by module name i.e. exported symbols are not added to projects symbol table but remain accessible through "long" Module.Symbol syntax (as they are already btw)?

This would be like poor man's namespaces with no backcompat breakage and super easy to implement until we come up with "canonical" namespaces in TB.

WaynePhillipsEA commented 6 months ago

Btw, do we have in TB an attribute for modules that require public procedures/props/enums/UDTs/etc. from a standard module to be explicitly prefixed by module name i.e. exported symbols are not added to projects symbol table but remain accessible through "long" Module.Symbol syntax (as they are already btw)?

There is currently a [ MustBeQualified ] attribute that you can apply to procedures in a standard module, but I don't think this works with Types/Enums yet.

Greedquest commented 6 months ago

I really don't like the idea of introducing implicit file-based namespaces.

We here as experienced programmers might understand namespaces, but I would consider the general concept a significant extra "barrier to entry" for newcomers (beginner programmers that might be starting out in BASIC, or just newcomers coming over from classic VBx).

I think namespaces will no doubt be a language feature in tB, but it will be optional and very much an opt-in feature for those that want to segregate their codebases in that way.

I have been leaning more and more towards your line of thinking despite initially championing the fine based namespaces.

The one way I could see it working would be if they were optional like we have for library names currently. So Just as I can call Abs or VBA.Abs or VBA.Math.Abs, if I declared a .twin file called VBA and a module in it called Math and a function in it called Abs, then I would get the same experience. All optional. Combined then with a Rubberduck Inspection for "Shadowed function call" and a quickfix "you can disambiguate this call by adding the following qualifications Abs -> VBA.Abs".

It would be a quick win for allowing you to at least have the possibility of splitting up your project over multiple files in a meaningful way that's more than just visual, and choosing to fully qualify calls, but with a very familiar optional experience to VBA.

Combined with Private Module and Private Class (see #48) to say "you can only call me from inside the same .twin file" I think you've got a very powerful system without ever needing to mention the word namespace

EduardoVB commented 6 months ago

I think that v1 blockers are issues related to "type 1" of backward compatibility, that is backward compatibility with VBx. Maybe IDK, but AFAIK there is no commitment to never break backward compatibility from tB version to a new tB version. Or even backward compatibility to VBx after v1.

To my understanding a "v1 blocker" issue is that, and not alleged "type 2" backward compatibility issues. That anyway how can it have that "backward compatibility" in things that are not present in the previous version?

Because, if we are going to see it that way, tB is not backward compatible with VB6 then, because it needs to import the files, but can't use the old VB6 files directly.

Perhaps we should discuss somewhere what exactly "v1 blocker" means before putting that label to more issues.

wqweto commented 6 months ago

There is currently a [ MustBeQualified ] attribute that you can apply to procedures in a standard module, but I don't think this works with Types/Enums yet.

Now the question is if there is a point for this be "promoted" to a module/project level attribute and I guess for Ax-DLL project this would mean using Set o = New MyProject.MyClass to instantiate classes and requiring Dim var1 As MyProject.MyEnum for declarations too which both currently work anyway i.e. the attribute must prevent only global symbol table population.

WaynePhillipsEA commented 6 months ago

I will check later, but I think you can already apply the attribute to a module/class to enforce the project qualifier.