Atom types and theory documentation website (Wiki -> Static site generator)

ferrouswheel commented 5 years ago

Wiki's were a great tool at one point in time, but mediawiki in particular is now somewhat clunky. Especially to keep documentation in sync with with code changes, as you can't ensure they are updated as part of a PR.

I think it'd be great to have Atom type definitions as part of the AtomSpace or OpenCog repos. Or as a separate repo. This could then be automatically published by github pages.

The content and structure on the wiki could be used to seed this documentation site.

Pros:

Using a static site generator (such as Jekyll) we could write a plugin to automatically highlight atomese examples and link to type definitions.
Trivial to checkout and edit documentation offline, with standard dev workflow (git, favourite editor)
PRs would make changes visible to the team, and allow people to either debate theoretical changes, or if a code change violates the type definition, a request could be made for an appropriate documentation edit.

Cons:

someone has to migrate the documentation
people are used to the wiki
others?

While I'd be happy to try to find time to do this (I know Jekyll reasonably well), it's also something a newbie could do as they are learning OpenCog.

linas commented 5 years ago

Especially to keep documentation in sync with with code changes,

Ouuuch. This remark in particular gives me the creeps. At this point, the AtomSpace should be considered to be mostly stable. For the most part, code changes should be bug fixes that do not require documentation changes. Yes, sometimes new features might get added, but if things are getting out of sync, then there is a deeper and more serious issue than merely how you publish documentation.

I guess these concerns might not hold for other git repos... but I value stability of the core components very highly, and this seems .. rather destabilizing.

Are there examples of Jekyll that can be viewed online? We have javadoc markup in the code, but the auto-generated javadoc websites are utterly hideous, unreadable and unusable, so I would be unhappy if the wiki morphed into some kind of Mr. Hyde.

ferrouswheel commented 5 years ago

There are example sites here https://jekyllrb.com/showcase/ (the jekyll site is also built with jekyll) - but it probably isn't the best for a documentation site. Sphinx might be better, the core idea though is that it's statically generated from markdown, rst, or other markup, stored in a git repo.

However, if the types have mostly settled down in behaviour then I guess there is little point.

I thought otherwise because Nil mentioned new PredictiveImplication links, and a Do predicate in opencog/opencog#3548 (just a single issue which I thought would have an obvious answer), and OpenPsi is mixing up contexts and actions. Thus I mistakenly thought things were more in flux than they are.

linas commented 5 years ago

Well, having a good-looking website is not just important, its very important. So I would encourage and support that. For example, I had been talking up https://grakn.ai which is a kind-of easy-to-use atomspace-lite, and its got a great landing page, and the first few tutorial are blindingly obvious. I'd love to have that. I have no clue how to get that. I'm stretched too thin to even try to figure it out, so I'm waiting for a UX person to show up at the doorstep.

Re: flux: Nil knows how to add PredictiveImplication its easy, like a one-line-of-code change in the atomspace. He could do it tomorrow if he really needed it. (the hard part is wiring it into PLN)

The Do predicate is pie-in-the-sky, and unlikely to happen for years. Unless maybe some lightning bolt hits. Never know. Many/most atom types do not need to be backed up by C++ classes, and those are "easy" to add.

ngeiswei commented 5 years ago

Agreed about the importance of having a good-looking, up-to-date website. I'm Also kinda waiting for a UX guy to show up as I'm not skilled for that, yet still thinking it's important. That is said I do agree with @linas that changes happen relatively slowly.

@linas, why do you think DoLink is pie-in-the-sky? It's just a one-liner, doesn't need to be backed by a C++ class (not initially anyway). When I said it is similar in spirit to Judea Pearl do-calculus I didn't mean we need to implement that, although this can probably be implemented on top of PLN if we want to, it's not that complicated, but that doesn't mean a DoLink cannot be introduced and used in the rule format.

ngeiswei commented 5 years ago

@ferrouswheel you've closed the issue, but maybe it can be let open, unless you guys think it is too broad or pie-in-the-sky or something? But who knows, maybe a UX guy will come and do it?

linas commented 5 years ago

do-link: it sounded technically difficult .. implementation, experiments, examples... don't know. consider it a random and likely incorrect remark.

ferrouswheel commented 5 years ago

@ngeiswei I'll reopen it so the discussion can continue (I'm still getting up to speed again and I didn't want to overly distract people with my issues, when there is plenty of work to be done already).

I feel the documentation in the wiki would still benefit from being in a repo. It's far easier to grep and sed, and all the other tools we take for granted while coding, vs editing pages individually via web forms.

Turns out that PredictiveImplicationLink is a good example of trying to keep doc and code consistent: in this issue we are saying it's easy to add, but the current wikipage has nothing to inform the reader that the link isn't implemented yet.

I think there is still value in having descriptions of unimplemented theory/plans, but if they are in a repo where doc changes can be kept in sync with code changes, the easier it is to make this distinction.

It would even be possible to link runnable examples with the documentation and ensure these are correct. While they remain in mediawiki it'd be pretty difficult to validate code segments do what they say they will.

As a sidenote, the documentation in the example dir, for how the pattern matcher uses various links/atoms, is awesome. So I just want to say thanks for that!

linas commented 5 years ago

Any further thoughts or plans on this?

This is related to another issue that really really irks e deeply. Compare the AtomSpace to grakn.ai -- when I look at grakn.ai I see (technically) an AtomSpace minus-minus -- an AtomSpace with all of the cool, advanced features missing. But marketing-wise -- its like the opposite -- they've got a really clean website, great-looking stuff. The demos are real easy to understand, so any joe-blow middle-of-the-road average programmer can understand those demos, and have a light-bulb go off in their head, and say "I know exactly how to use this! Maybe I'll use it in my next project!" Compare this to the standard reaction to the AtomSpace - "wtf is this space-alien shit?"

I really really would like to give the AtomSpace some kind of face-lift, apply some kind of makeup, so that people can compare it to any of the other graph databases, and go "ah, I see!" and be able to use it within minutes after installing it.

Joel, I think you have some of this kind of user-design sensibilities and judgment; would you care to tackle this? Figure out what we'd need to do to make this stuff accessible? I mean, why can't we be as usable and understandable and approachable as other graph databases?

ngeiswei commented 5 years ago

And easily defining custom types https://github.com/opencog/atomspace/issues/2212 and https://github.com/opencog/atomspace/issues/2211 once supported, could be a step in such make-up tutorial, to promote freedom to the user.

vsbogd commented 5 years ago

@ferrouswheel, @linas, if I got your points correctly there are three main items in this issue:

documentation should be kept in repo, as close to code as possible (@ferrouswheel point); we already have Doxygen documentation https://github.com/opencog/atomspace/tree/master/doc/doxydoc, but it is probably not up to date though;
documentation should have nice look&feel style (@linas point); I am sure there are good looking themes for doxygen; fast googling brings this one, real project result example
documentation should be clear and be easily understandable by newbies (@linas point); this obviously can be solved only manually, so one should sit and draw nice structure for documentation (like usual "Getting started", "Tutorials", "Reference" articles).

So for me it looks like we could make an effort to move up to date wiki content into repo and use nice doxygen theme to make it look good. What do you think about this?

linas commented 5 years ago

@vsbogd yes, but no.

It's not really about the "documentation", its more about the website design. For starters, there needs to be an atomspace website, distinct from the opencog website. There needs to be a clear boundary between them, as otherwise, it just confuses everyone.
Next, the website should look "modern". Right now, it just looks olde-fashioned and klunky.
The first thing users should see are some easy, obvious demos that are obviously applicable to things that they want to do. I tried to create some "easy obvious" demos in the examples dir, but they need to be ported to the web.
If we want to really be like grank.ai, but better, we need sme kind of wrapper that converts some kind of json-like syntax to atomese. Almost everyone hates parenthesis (i.e. scheme), and I think python is a terrible API choice for the atomspace. That means we really need something json-like so that newcomers can feel comfortable. This is a new project, though.
The wiki documentation for the different atom types is MUCH more important than C++ API documentation. Users need to see that first, to be directed towards that first.
Some API's are first-class, some are second class, some are third-class. The second and third-class API's should be invisibie to new users. First-class are cog-execute! and cog-evaluate! and that is pretty much it (whatever can be found in the examples dir) Second-class is the zoo of scheme helpers and utilities. Third class is doxygen C++ API. (No one except systems programmers should be working in C++. and 95% of the contents of teh C++ header files is "not for public use". There is no forward/backward compatibility for C++. Only the scheme API guarantees backwards-compat.)
Doxygen is horrible. That's not just cantankerous-me being unreasonable and irrational. Find some website using doxygen, and just look at it. Oh, can't find any? Well, that's because everyone has realized how awful doxygen looks, and have stopped using it. At any rate, the doxygen is for the C++ classes only, which are third-class, and are private API's that most users should never use.
The current wiki contents is pretty much completely unrelated to any code checked into the atomspace git repo. If we're going to store the website in git, it needs to go into a brand-new git repo, instead of being jammed into an existing one. Separation of concerns.

vsbogd commented 5 years ago

The current wiki contents is pretty much completely unrelated to any code checked into the atomspace git repo

Any PR which changes API should also contain documentation update as well. It is the reason why it is better to have documentation in same repo. Examples and tutorials could be located in separate folder.

If we want to really be like grank.ai, but better, we need some kind of wrapper that converts some kind of json-like syntax to atomese.

Not sure JSON will be more readable than Scheme. May be using spaces to designate structure (like control blocks are separated in Python) will be more convenient. At least we always use it at board to write atomese snippets. For exampe:

GetLink:
  VariableList:
    TypedVariableLink:
      VariableNode "$X1"
      TypeNode "NumberNode"
    TypedVariableLink:
      VariableNode "$X2"
      TypeNode "NumberNode"
  EqualLink:
    PlusLink:
      VariableNode "$X1"
      VariableNode "$X2"
    NumberNode "11"

The wiki documentation for the different atom types is MUCH more important than C++ API documentation.

Some API's are first-class, some are second class, some are third-class.

Using Doxygen doesn't mean describing only C++ API. Right now we don't have good Atomese types description. I mean for example Link can contain any sort of atoms in its outgoing set but some links (for example PlusLink) can be constructed only with certain type of atoms. Ordered links can interpret atoms on different positions in outgoing set differently and so on. We could make Atomese types formal descriptions and document them.

Doxygen is horrible. Find some website using doxygen, and just look at it.

This one looks pretty: https://doc.magnum.graphics/magnum

.. 3.

ok

linas commented 5 years ago

Any PR which changes API

I kind-of want to limit the PR's that change the API. Anyway, look at the wiki, you will see that it is almost completely unrelated to the code. I really really do not want the wiki contents as a part of this git repo; that would be a management disaster.

Not sure JSON will be more readable than Scheme.

The JSON variant would have to be radically different, radically reimagined. You're right, in that simply replacing parens with curly braces and commas would be stupid. And python whitespace indentation would be an anti-improvement.

So --- I keep saying this -- Atomese is not meant for human programmers; its meant to be a kind-of assembly code for other algorithms to manipulate. Atomese is too verbose, too awkward to be easy-to-use by humans. Think of Atomsese as a kind-of "bytecode for graph databases". For a human-level API, we would need to invent something simple, easy, convenient to use, and compile it down into atomese. This would be a big project.

This one looks pretty: https://doc.magnum.graphics/magnum

You and I have different conceptions of "pretty". For me, that's an excellent example of "ugly" -- it highlites everything wrong and unpleasant about doxygen. Its depressing. If I had to read that for work, I would quit. You couldn't pay me to deal with that. Right up there with UML. https://www.google.com/search?tbm=isch&q=UML Run away!

vsbogd commented 5 years ago

Anyway, look at the wiki, you will see that it is almost completely unrelated to the code.

Wiki describes Atoms behavior which is implemented in code, doesn't it?

I really really do not want the wiki contents as a part of this git repo; that would be a management disaster.

I don't understand the reason.

The JSON variant would have to be radically different, radically reimagined. For a human-level API, we would need to invent something simple, easy, convenient to use, and compile it down into atomese.

If I understand your point correctly, you are suggesting creating new KB description language which will be more convenient for humans than atomese. It should be separate issue probably. My understanding was that we are discussing atomese documentation.

You and I have different conceptions of "pretty". For me, that's an excellent example of "ugly" -- it highlites everything wrong and unpleasant about doxygen.

Not sure if you are talking about style things, color scheme or something else. My point is that these things can be changed by applying another theme. But if it is about writing documentation manually then Doxygen for sure is not most relevant tool.

noskill commented 5 years ago

For a human-level API, we would need to invent something simple, easy, convenient to use, and compile it down into atomese.

Maybe we can reuse Grakn query language?

linas commented 5 years ago

Maybe we can reuse Grakn query language?

Yes; but we would have to be careful, because we have many features that they don't, so we would have to be careful to not close the door on those. Also, performance measurements could get exciting.

linas commented 5 years ago

I really really do not want the wiki contents as a part of this git repo; that would be a management disaster.

I don't understand the reason.

Because there will be non-developers wishing to update the website, and I don't want to mix together those changes with actual code. The procedures are different:

Do not need to re-run the unit tests, if the website changes
Do not need to rebuild the website, if someone changes the code
One website maintainer might want to give another commit and merge privileges; its OK to have that, for the website only, but not OK for the part of the repo containing source code.
The user-interface people might be fiddling with web CSS stylesheets or writing nifty javascript layout widgets. This is code, but it's unrelated code that the atomspace runtime itself does not need. Its not modular.
I want to avoid the situation of having to npm install or yarn install 200 or 300 different packages, just to build the atomspce; the atomspace doesn't need npm or node.js right now, and it should stay that way...
The website people might be unhappy about having to install C++ and cmake, just to update the background color of the website ...
This runs counter to opencog/opencog#3391 where we want smaller, easier-to-manage git repos, instead of bigger, more complicated ones.

Its different groups of people, different mindsets, different workflows, different attitudes, and no compile-time or run-time dependencies between the two. If there are no dependencies, they don't belong in the same repo.

ferrouswheel commented 5 years ago

I haven't used Doxygen for ages, but unless it's substantially changed I feel it's long in the tooth and makes writing quality documentation more annoying than necessary. I'd also prefer a more common markup system like markdown or restructured text (rst) so that it's easy for new contributors to help.

Re: documentation in the same repo as code, I'm on the fence.

On the side of having docs in the same repo, though these are mostly just counters to some of Linas's arguments:

We don't have to tie the documentation generation in with cmake. If Linas or someone else wants to avoid installing whatever dependencies are required for the documentation generation that's fine, but having the documentation in a different repo does not guard against installing doc-generation dependencies! If someone wants to work on it (and validate the changes look right), you'll still need whatever is required to build the docs. However my preference is not to introduce node/npm regardless of where the docs live.
It's easier to ensure atomic updates - when the code changes, we can ask that people update the relevent documentation and it's clearer if there is a descrepency between what the docs say and what the code does. Realistically tho, people are not that disciplined so it's probably not worth worrying about this!
Whether the atomspace tests rerun for doc changes or vice versa is really irrelevent. It's an optimisation and if there is a functional and reliable CI pipeline we don't need to think about it. If the website updates, and it's redundant but automatic, it shouldn't be something I spend any time thinking about even if I'm only modifying code.

On the other side - reasons I think the documentation should be in a separate repo:

Many book-level guides and documentation projects are separate. I'm currently in love with rustlang, and while they have great documentation tools and doc tests, their major documentation projects e.g. https://doc.rust-lang.org/book/ are separate repos. Linas is right that it's a different mindset, and having the documentation in it's own repo may help people switch to the documentation mindset when working on it.
As a separate repo, we could use mdbook or some other tool that allows documentation examples to be run automatically when building docs. This would allow us to set up the documentation repo as a downstream task in the circleci workflow, and make us aware when atomspace changes break the documentation examples.

I think I'm leaning towards a separate repo tho.

Regarding the earlier question. I am interested in doing this, but in the short term (next 1-2 months) I don't think I'll have time to do the work, but maybe after that if someone hasn't already done it.

ferrouswheel commented 5 years ago

Also, one issue with framing the project around "AtomSpace as database" is that persistance needs to be a first-class configuration that is used consistently through the project and docs. I know we have a persistance layer but it's not clear (at least to me) how transactions work, what resilience guarantees there are, etc.

Any documentation project should make sure setting up postgres or another backing store is just part of normal operating procedure and part of any "getting started" guide. All examples should be reviewed in terms of whether they work with the persistance layer in a sensible way.

(I'm not saying that this isn't already the case, I'm just not personally aware of the current state or how universally persistance is used... maybe it's integrated low-level enough that it Just Works(TM) )

Ideally we'd have a non-postgres option that works just as effectively for people getting started, e.g. sqlite or another file-based db. Maybe dumping to scm works, but it's seems hard to see how this can be efficiently updated for any operation other than "dump the entire atomspace".

The alternative is to frame the AtomSpace as "in-memory graph knowledge base" or something, but when someone says "database" I expect a strong persistance story out-of-the-box.

vsbogd commented 5 years ago

Ok, it looks like this issue is mostly about documentation representation form. Like should we keep documentation in repository or in wiki. Should it be website or something else. For me more important question is one that @stellarspot raised in #2308: how could we represent atoms behavior in documentation and guarantee that it is up to date?

linas commented 5 years ago

offtopic reply to @ferrouswheel re persistance. Short answer: "it just works". Atoms are immutable (can only be created/destroyed) and globally unique (all users connected to a DB see the same atom). Values are mutable, non-unique, and the last updater wins.

Atoms are searchable (by name, by incoming/outgoing set, by pattern query) Values are not searchable (if you don't know the key of the key-value pair, you won't find the value. Well, you can always ask for all keys... but that is not the same as being truly "searchable" You can't pattern search for values.)

Metaphor: atoms form a graph, the way pipes form a graph. Values are the things flowing in the pipes.

Multiple users can share atoms (and thus see value updates) but the sharing is explicit: you must explicitly write an atom out to the database (call store-atom) and then read it (fetch-atom) This is explicit because (1) automated-sharing causes wayyy too much update traffic. (2) some DB's are so large they won't fit in RAM (3) users can focus on that part of the data that they need to, instead of getting swamped by other junk.

how transactions work,

non-issue.

what resilience guarantees

Whatever the underlying DB provides. Postgres scales to thousands of servers, so in principle, we could have tens of thousands of simultaneous users. of an atomspace. In practice, there are other issues that are bottlenecks.

There's a gaggle of second-order issues: including the management of overlays on top of read-only DB's (the agi-bio need this: load large datasets which are read-only and shared, individual users then create local mods layered on top.) There's more but its second-order.

opencog / atomspace

Atom types and theory documentation website (Wiki -> Static site generator) #2232