vision-dbms / vision

The master repository for the Vision database system.
https://vision-dbms.com
BSD 3-Clause "New" or "Revised" License
27 stars 12 forks source link

A Vxa-Based Multi-Platform Open Source XML Parser Extension to Vision #34

Closed MichaelJCaruso closed 6 years ago

MichaelJCaruso commented 6 years ago

VXA, the Vision eXternal Adapter framework, supports the creation of C++ based (and eventually other) extensions to Vision. While Vxa is used internally in production, it has no presence or working examples in the Vision open source world. The goal of this pull request is to start changing that by creating a simple, useful, Vxa based Vision extension that works on all of our platforms and that lays some groundwork for a robust extension development API and ecosystem.

While documenting Vxa is well beyond the scope of this note, what follows summarizes what extensions are, how they can be created and used now, and the changes associated with this pull request.

The pugixml XML Parser Extension

The simple example used here to illustrate (and debug) Vxa is a wrapper of pugixml - a redistributable (MIT license), open source XML parser written in C++. Full featured 1, light-weight, and well-documented, its implementation consists of just three, amalgamated source files, making it particularly well-suited for embedded compilation by our build system.

Using the parser

In a nutshell, Vxa extensions are micro-services that Vision uses to reference foreign objects and invoke their methods. In practice, extension micro-services can be deployed as free-standing servers or as client 'add-on's accessed via callbacks. The code found here provides a free-standing server - _va_pugixml_. As is the case for all servers built using the Vxa framework, that server supports a number of command-line options controlling how and where it offers its services. For testing purposes, one of the simplest is to have the server listen for incoming connections at a tcp address of your choosing. For example, the command

path-to-the-pugixml-server/va-pugixml -address=9876 &

instructs the server to listen for incoming connections at port 9876. If that command is used to run _va_pugixml_, the following Vision expression will access it:2

V> !xmlParser <- "localhost:9876" __asExtensionObject
Object

where, if necessary, the message __asExtensionObject is defined at String as:

V> String define: "__asExtensionObject" toBePrimitive: 8

At this point, xmlParser refers to the external object that is the root object of the running va_pugixml server.

Like other Vision objects, xmlParser responds to messages that perform actions and return results. For example, its help message shows the names of the messages it implements:

V> xmlParser help
The class VA::PugiXML::Root supports the following methods:
help
load:
rttiName

help and rttiName are convenience messages automatically added by Vxa. The load: message is a user-defined message that parses XML contained in a file. Files containing XML are everywhere, included this repository. The Visual Studio project file used to build this extension is, for example, a file containing XML:

V> !doc <- xmlParser load: "va-pugixml/va-pugixml.vcxproj"
Object

Assuming a successful parse, doc references an instance of an XML DOM external object that responds to a collection of useful messages for examining and traversing its structure:

V> doc help
The class VA::PugiXML::Document supports the following methods:
help
isEmpty
attributeCount
childCount
getName
getValue
childValue
childValueOf:
parent
getChild:
firstChild
lastChild
nextSibling
previousSibling
getAttribute:
firstAttribute
lastAttribute
parsedOK
parseStatus
parseDescription
rttiName
V> doc childCount
        1
V> doc firstChild help
The class VA::PugiXML::Node supports the following methods:
help
isEmpty
attributeCount
childCount
getName
getValue
childValue
childValueOf:
parent
getChild:
firstChild
lastChild
nextSibling
previousSibling
getAttribute:
firstAttribute
lastAttribute
rttiName

Here, for example, is a bit of Vision code that displays the tag names of the top level children of that XML structure:

V> doc firstChild childCount sequence0 do: [
V>   ^self print: -4;
V>   !child <- ^my doc firstChild getChild: ^self;
V>   child getName print: 20;
V>   child childCount printNL: 5
V> ] 
0   ItemGroup               4
1   PropertyGroup           4
2   Import                  0
3   PropertyGroup           3
4   PropertyGroup           3
5   PropertyGroup           3
6   PropertyGroup           3
7   Import                  0
8   ImportGroup             0
9   ImportGroup             3
10  ImportGroup             3
11  ImportGroup             3
12  ImportGroup             3
13  PropertyGroup           0
14  PropertyGroup          13
15  ItemDefinitionGroup     4
16  ItemDefinitionGroup     4
17  ItemDefinitionGroup     4
18  ItemDefinitionGroup     4
19  ItemGroup               4
20  ItemGroup               7
21  ItemGroup               7
22  Import                  0
23  ImportGroup             0
List of 24

A peek under the hood

Creating a Vxa based Vision extension is an exercise in creating a Vxa conformant implementation of a collection of C++ object classes and their member functions. Vxa conformance is primarily concerned with two areas:

  1. absolute control of object lifetimes
  2. metadata driven object introspection and use.

What follows illustrates the use of some of the implementation tools and conventions Vxa provides to address these concerns. 3

By 'convention', every Vxa extension exposes a root object that serves as the first object accessed by Vision. In the case of this particular xmlParser, that object implements one operation - load: - that loads the XML contained in a file. Here's how that root object class is implemented, beginning with the class definition found in its header file:

namespace VA {
    namespace PugiXML {
    class Root : public Vxa::VCollectableObject {
        DECLARE_CONCRETE_RTTLITE (Root, Vxa::VCollectableObject);

    //  Class Builder
    public:
        class ClassBuilder : public Object::ClassBuilder {
        protected:
        ClassBuilder (Vxa::VClass *pClass);
        };

    //  Construction
    public:
        Root ();

    //  Destruction
    private:
        ~Root () {
        }

    //  Document Creation
    public:

    //  Methods
    public:
        void loadDocument (Vxa::VResultBuilder &rRB, VString const &rFilename);

    //  State
    private:
    };
    }
}

Understanding this example begins with its place in the inheritance hierarchy. Vxa exposed classes must be derived, directly or indirectly, 4 from Vxa::VCollectableObject. Vxa::VCollectableObject provides object lifetime management services and support for mapping C++'s traditional object model into Vision's collection oriented functional data model. While most of the details of object model integration are largely hidden by Vxa, object lifetime management requires two more bits of class annotation.

Expansion of the DECLARE_CONCRETE_RTTLITE C++ macro injects definitions of key virtual functions into the class. Those virtual functions ultimately call this class' destructor. To prevent some nasty bugs, that destructor must only be called by these injected functions. To further ensure that is the case, all Vision managed classes (not just Vxa) mark their destructors as either private or protected depending on the need for additional subclassing. DECLARE_CONCRETE_RTTLITE also injects a key typedef - Reference - into the class definition. Reference, an alias for VReference<Root> in this case, is a smart pointer type that should be used wherever a strong reference to an instance of this type is required.

The following excerpt highlights the relevant parts of this class' definition and annotation:

namespace VA {
    namespace PugiXML {
    class Root : public Vxa::VCollectableObject {
        DECLARE_CONCRETE_RTTLITE (Root, Vxa::VCollectableObject);

    //  Destruction
    private:
        ~Root () {
        }
    };
    }
}

Class framework in place, the next step is declaring and implementing operations to be exported. In the case of the root class, one operation - load: - needs to be implemented. Here's how the member function that will implement it is declared:

namespace VA {
    namespace PugiXML {
    class Root ... {
    //  Methods
    public:
        void loadDocument (Vxa::VResultBuilder &rRB, VString const &rFilename);
    };
    }
}

As ultimately called by Vision, this function takes a single string parameter, declared as a VString const& here. In addition to declarations for Vision supplied parameters, this declaration adds a parameter of type Vxa::VResultBuilder&. 5 that will be used to return a result to Vision. Every member function that will be used to implement an exported object method must include exactly one occurrence of a parameter of type Vxa::VResultBuilder& somewhere in its parameter list.

Here's how it's used in the implementation of the loadDocument member function:

void VA::PugiXML::Root::loadDocument (Vxa::VResultBuilder &rRB, VString const &rFilename) {
    rRB = new Document (rFilename);
}

With class Root defined, the last step is creating and initializing the metadata that Vxa uses to export this class to Vision. As currently implemented3, a small boilerplate definition must be added to the definition of class Root. That boilerplate, excerpted here, adds the nested class ClassBuilder to class Root:

namespace VA {
    namespace PugiXML {
    class Root ... {
    //  Class Builder
    public:
        class ClassBuilder : public Object::ClassBuilder {
        protected:
        ClassBuilder (Vxa::VClass *pClass);
        };
    };
    }
}

ClassBuilder defines one member - a constructor. Here's it's definition:

VA::PugiXML::Root::ClassBuilder::ClassBuilder (Vxa::VClass *pClass) : Object::ClassBuilder (pClass) {
    defineMethod ("load:", &Root::loadDocument);
}

The only purpose of this constructor is to bind member functions of class Root to their Vision message names.

The final metadata creation step is to define the private (note the anonymous namespace) singleton and Vxa required template instantiations that connect this class to the Vxa framework:

namespace {
    Vxa::VCollectable g_iRootMeta;
}
DEFINE_VXA_COLLECTABLE(VA::PugiXML::Root);

Finally, regarding the definition of ClassBuilder and its constructor, when exported classes inherit from one another, their class builders must too. In the implementation of va_pugixml, class VA::PugiXML::Document inherits from class VA::PugiXML::Node. Here's the constructor for VA::PugiXML::Document::ClassBuilder:

VA::PugiXML::Document::ClassBuilder::ClassBuilder (Vxa::VClass *pClass) : Node::ClassBuilder (pClass) {
    defineMethod ("parsedOK", &Document::getParsedOK);
    defineMethod ("parseStatus", &Document::getParseStatus);
    defineMethod ("parseDescription", &Document::getParseDescription);
}

The effect can be seen in the help message output earlier in this document.

Technical Notes

This pull request builds directly on release-8.1 as of commit 2c5011a. With the exception of fixing a minor bug in the Mac OS implementation of C++ type name demangling (commit 0a0557d), the only changes to existing code are to the parts of Vxa used to support extension development. No changes have to been made to any code path used by production systems built from this code base.

As explained in greater detail in the notes to commit a298060, most of the required changes to Vxa are related to differences in default shared library symbol visibility between Windows (hidden unless explicitly exported) and Linux/Unix (exported unless explicitly hidden). While mostly applicable to Windows, these changes are also of significance to Linux shared libraries built to hide symbols by default. As such, they should help to address the original need for a collection of old changes reverted by commit 85ea25f and re-implemented by that commit and commit 0e94599.


1 pugixml supports a much richer set of operations than the subset of operations exposed by the prototype Vision extension implemented here. Expanding that subset will undoubtedly want to occur given its usefulness.

2 While not strictly necessary, performance will be much better if you set the VxaICE environment variable (e.g., export VxaICE=true) prior to starting Vision. The improvement will be particularly noticeable when Vision's collection operations are used to retrieve large numbers of objects in parallel from the parser.

3 The API illustrated in the examples that follow is very much a work-in-progress. Class names, for example, chosen to reflect one set of concerns, may be changed to better describe their relationship to the API's purpose. Similarly, a number of patterns currently stated explicitly may, and probably will, be changed to encapsulate their implementation details.

4 The use of C++ multiple inheritance has not be fully explored in the context of Vxa extensions. In principle it can be made to work; however, it almost certainly will need to be virtual and is not likely to come without change to the Vxa extension writer's API.

5 Among other things, Vxa::VResultBuilder allows different code paths in the member to return results of different types - for example, a valid result value on some paths, error objects on others, and object futures [coming soon] on still others.