MeTTa libraries documentation

This is a subissue of #319 .

The goal is to introduce MeTTa code documentation machinery for MeTTa code in general, libraries in particular, and stdlib especially.

The proposal is to document MeTTa code in MeTTa itself. There are a few choices to be made:

In what format to place such pieces of documentation: doc-strings, doc-expressions with certain type structure, or something else?
Where exactly to place them: the library space itself, a global &doc space, a separate doc-space for each library?
Where MeTTa code for documentation is formed? Should register_token accept it as an additional argument? Should the space being imported just contain doc-expressions as stand-alone expressions similar to type definitions, or should these expressions be somehow places inside equalities like Python docstrings?
How should documentation (+type declarations) be automatically gathered and can missing documentation be automatically detected?

There are a few issues that may prevent us from answering all these questions atm:

we don’t have a notion of “known tokens” both for pure symbols and tokens for grounded atoms. Pure symbols don’t require obligatory definition to be used. While typing expressions can be used for “detecting” known symbols, even typing expressions can be tricky, e.g. (: ($t A) (B $t)) is a valid expression. All mentioned symbols can be extracted from an entire space, though. Tokens for grounded atoms are regular expressions. Apparently, we cannot document each concrete instance, e.g., each number. The situation may partly change if we introduce bindings as a part of spaces.
We have libraries, imports and py-extensions. We still don’t have a strict way to define new libraries. As for stdlib, we can either put doc-expressions in its space or do anything else we want. But will “anything else” work for custom libraries? For the case of importing pure MeTTa scripts, we have no choice but to put doc-expressions into scripts themselves, although it doesn’t necessarily mean that these expressions will remain in the same space. E.g., documenting can be done via import-time execution (that is, with something like ! (doc …)). Also, documentation for libraries and for pure MeTTa code may not necessarily be done identically.

Let’s consider some examples, of how it could look like.

1) Doc-expressions are mere expressions over strings put arbitrarily into spaces, e.g.

(: doc (-> Atom String Atom))
(doc doc “Document any atom or expression”)
(doc String “The type for grounded strings”)
(doc + “Grounded operation for summing two numbers”)

For stdlib, all docs are just put in stdlib.rs/metta_code(). Thus, all docs can be retrieved by matching against &self since &stdlib space is places into it. An option is to have doc as a function. Pros: no need for special implementation, uniformity over different use cases. Cons: definitions are detached from implementation of grounded functions. The main space is littered with doc-expressions. No formal structure amenable to automatic processing is provided.

2) Structured doc-expressions in a dedicated space.

(: doc (-> Atom DocEntry Atom Atom))
(: FnParam (-> Number DocEntry))
(: FnOut DocEntry)
(: DocModule DocEntry)
!(doc + (FnParam 0) “The first number to add”)
!(doc + FnOut “The sum of two numbers”)
!(doc + DocModule stdlib)

The structure can be different. Some doc-expressions (e.g. module) can be added automatically.

doc is a grounded function defined in stdlib, which puts its input to &doc space, that can be called from any script. Then (match &doc (doc …)) can be used to retrieve doc-expressions. We can move in the direction of even more complete formal specification of semantics and pragmatics of symbols and grounded atoms for self-programming. But maybe we just need to keep the possibility for future extensions with doc-expression types explicitly indicating whether the description provided as a string or a richer structure.

Pros: possible further automatization and additional formatting of the documentation with possible cross-references, etc. Cons: might be a little bit annoying to follow the structure of doc, but can be mitigated by providing syntactic sugar for loose doc-strings.

3) In-place documentation, e.g. doxygen-like

;;; \brief My first function
;;; \param[out] a random atom
;;; \param[in] src yet another random atom
(= (my-f $a) (…))

Such comments can be processed by the parser in a special way and turned into expressions. Pros: may look convenient and human-readable. Cons: we don’t have single definition per function; special processing may make this harder to extend in the future, etc.

These examples are not mutually exclusive and can be partly combined. There are more detailed to discuss and flesh out, of course.

We can proceed step-by-step and start with agreeing on embedding MeTTa-documentation in MeTTa itself with further automatic extraction to html or whatever.

Apparently, we cannot document each concrete instance, e.g., each number.

But we can add same documentation for each token (you probably meant it, just wanted to mention it explicitly). On the other hand for the grounded value tokens we probably need to document types instead of tokens themselves.

Doc-expressions are mere expressions over strings put arbitrarily into spaces, e.g.

Not sure I understand "cons" properly. We can define doc as a grounded function and get documentation string from a grounded atom or an atomspace depending on atom which is passed as an argument inside doc function.

Structured doc-expressions in a dedicated space.

Another downside of having separate &doc space is that we need to have one more global space. I wanted to eliminate global spaces inside grounded functions in "minimal MeTTa" as using them causes problems. In particular each imported space should have its own copy of the grounded symbol with &self space embedded.

I like the structured documentation approach from option (2). But I think we could keep documentation in the same atomspace where the code of the module lives. I very keen to idea of replacing current MeTTa runner by space which has ability to keep/run grounded functions. In this context docs and types of the grounded functions can be represented uniformly.

Before uniform representation is implemented we could divide ways of documenting grounded atoms and pure functions. providing uniform interface through MeTTa runner.

In what format to place such pieces of documentation: doc-strings, doc-expressions with certain type structure, or something else?

To me the most convenient way is to keep documentation as expressions with a specified structure. Thus separate doc formatting function can read data and provide it to the user in a most convenient way.

Where exactly to place them: the library space itself, a global &doc space, a separate doc-space for each library?

I would suggest keeping documentation in the same space where corresponding atoms are kept. In this case documentation hierarchy is the same as hierarchy of modules.

Where MeTTa code for documentation is formed? Should register_token accept it as an additional argument? Should the space being imported just contain doc-expressions as stand-alone expressions similar to type definitions, or should these expressions be somehow places inside equalities like Python docstrings?

As a first step we could provide documentation for the pure symbols explicitly in an expression form. This allows using current parser without modifications. For the tokens we don't have a simple solution. We can have multiple tokens definitions which are effectively represent the same token. Thus adding documentation to the register_token call will lead to the code duplication.

As tokens can be duplicated there is a question do we need to document each registered token. May be it is enough to document the type of the value for the tokens which produce values. And when token produces the grounded function we could document the function itself. I would suggest keeping documentation of the grounded tokens inside accompanying space. For stdlib it is the metta_code for the 3rd party modules it should be a part of the module.

How should documentation (+type declarations) be automatically gathered and can missing documentation be automatically detected?

I would say in general we should have each symbol and each grounded atom documented. Thus when user is asking for a help for some symbol or grounded atom name the help function can provide the documentation for this symbol.

Possible algorithm of the `help` function.

help function checks the meta-type of the atom. If atom's metatype is Grounded then help gets the type of the atom. There are two options. If the type is a function type then help searches documentation by the atom itself. Otherwise help searches documentation for the type atom. In a case of the Symbol metatype help could search for the documentation using symbol itself. For the Expression metatype help can try to search documentation for the whole expression or for the first atom of the expression also depending on its type.

Proposed documentation format

Documentation of the function:

(function
  (description "Description of the function")
  (parameters
    ; name of the parameter is extracted from the function definition
    ; type of the parameter is extracted from the function type
    ; in/out effectively each parameter in MeTTa can input and output value
    ;        at the same time but it makes sense sometimes to restrict this
    ;        to only input or only output, it can be done using `sealed`
    (parameter "First parameter's description")
    [... (parameter "Second parameter's description")]
    ; type of the returned value is extracted from the function type
    (return "Description of the return result")
  ))

Documentation of the other atoms:

(atom (description "Description of the atom"))

There are different possibilities to link atom to the documentation:

define function (= (doc <atom>) ...) which returns a documentation of the <atom>
add (name ...) to the documentation atom and use it as an anchor to search documentation from the help function

I would try to implement help and documentation examples and finalize decision after experimenting.

To me the most convenient way is to keep documentation as expressions with a specified structure.

Yes

I would suggest keeping documentation in the same space where corresponding atoms are kept.

As a first step we could provide documentation for the pure symbols explicitly in an expression form.

Proposed documentation format

Looks good. The only concern I have is that tuples are not conveniently deconstructable. OTOH, providing descriptions of each parameter as a separate expression is also not convenient in terms of the parameter order specification.

There is an idea to mark functions as deterministic/non-deterministic. It is similar to in/out value passing direction. On the one hand we could mark it such in documentation. On the other hand it is a part of the function contract and should be available for the analysis by interpreter thus should be a part of the function definition.

Providing this info in docs in such a way that it looks formal but doesn't influence the interpreter and can be in contradiction with the function behavior (and the function contract if it will include it) looks like a possible source of confusion. OTOH, consistency of documentation and function contracts can be automatically checked (which is also a possible case for formal parameters). Maybe, we should just call it spec instead of doc and add any metadata there :)

https://github.com/trueagi-io/hyperon-experimental/pull/694 adds documentation for the standard library.

trueagi-io / hyperon-experimental