kwrooijen / keydox

Keyword metadata through EDN files
1 stars 0 forks source link

Ideas #1

Open kwrooijen opened 3 years ago

kwrooijen commented 3 years ago

https://clojure.org/reference/metadata

{:duct/migrator 
 {:doc "Duct migration key. e.g. duct-ragtime. 
Migrations can be run by executing your project's main function. 
```sh
lein run :duct/migrator
```"}}
RickMoynihan commented 3 years ago

That's a good start @kwrooijen, I did some more thinking and refined what you have into this.

Firstly some ideas:

  1. It strikes me that the keydox vocabulary might as well be self documenting, so it documents its own format. That's what I've tried to do below.
  2. It should support end user extension with arbitrary data, hence all our keys should be namespace qualified. This convention also means when looking at keydox example configs it's clear what data is a keydox definition, and what data can be end user specified.
  3. I think when rendering the docstring the format of the string is important, e.g. plain-text vs a subset of markdown (I'd suggest only inline styles), extensible to potentially others too. This could be done through a tagged #keydox/markdown literal, but I'm currently undecided on that.

I think that gives us something that looks like this:

{
 :keydox/doc {:keydox.doc/format :keydox.format/plain-text
              :keydox/doc "An edn string providing a keydox documentation string for the specified key."
              }

 :keydox.format/plain-text {:keydox/doc "Indicates the :keydox/doc string is formatted as plain text."}

 :keydox.format/markdown {:keydox/doc "Indicates the :keydox/doc string is formatted in the keydox markdown subset."}

 :keydox.doc/format {:keydox/doc "The format/encoding of
 the :keydox/doc key. The value should be a namespace qualified
 keyword, that provides a hint on how to render the contained string.
 If this is unspecified it is assumed to
 be :keydox.format/plain-text."}

 ;; documenting unnamespaced keys is supported 
 :doc {:keyox/doc "Used by the clojure core language as a key that holds a docstring as metadata on a var"}
 }

I was imagining that we'd also define as part of the standard that a well known file is placed at the root of the resource path, like duct_hierarchy.edn or data_readers.clj. I'd suggest keydox.edn or perhaps keys.edn. The tooling to locate these definitions files should I think also allow for multiple keydox.edn files to be present, like is done for duct_hierarchy.edn. I don't think they should be meta merged though; instead I think all resources should be found and iterated over. This would allow potentially many definitions for the same key from different vendors (resource paths). That I think would be fine; though we may also want a way to disambiguate where the definitions are coming from. I'd suggest some kind of :keydox/defined-by key or perhaps tooling can just fall back to using the resource path to disambiguate definitions?

I think it would be good to keep the vocabulary terms pretty minimal, though one thing I was thinking that would be beneficial is providing a bit more structure to the presentation layer, when it's eventually rendered into a component docs site for example.

It's just a bag of unstructured keys, with no way to distinguish minor keys that are just implementation details from major ones that people want to look for. This is what I was alluding to having on a slack, some way to organise the presentation of these keys. I still think the keydox format should follow the simple :key.to.be/documented {:documentation :metadata ,,,} format, but that the extra presentational structure should be encoded in the references.

The obvious way is just sorting the qualified keys in the rendering of them, which will essentially group them adjacently by their namespace prefixes, but again unimportant keys will drown out the ones you want to be more salient. So I did wonder if we should provide a way to group or tag keys together e.g. :keydox/tags and provide a way to organise tag groups.

I'll leave it here for now, rather than going too deep without your thoughts on this. What do you think?

RickMoynihan commented 3 years ago

I should add I was thinking that the core keydox project would then essentially just define/document itself in a keydox.edn file and libraries and tooling for processing keydox data would be in separate projects. So the keydox/vocabulary dependency would essentially then just be a jar containing essentially a single keydox.edn file.

kwrooijen commented 3 years ago

I like the idea of having the top level key being the key that's being described through metadata. The self documenting is also a great idea (I was thinking of using Clojure's :doc keyword, but this is better).

One thing I noticed is this:

It should support end user extension with arbitrary data, hence all our keys should be namespace qualified.

But you document the :doc key, aren't these conflicting ideas?

As for formatting, we could also have a keydoc.core/format-markdown function or what have you. Since the Clojure convention is to use markdown in :doc strings, I think it would be best to do that as well? Instead of specifically specify :keydox.format/markdown. An alternative could be :keydox.doc/format :plain-text. Then our formatter could check if this option is set?

As for multiple keys, maybe we could take the resource path and find its project root? I'm not sure how reliable that is though. e.g.

{:duct/core
 {:duct/migrator {:keydox/doc "..."}}

 :my/project
 {:duct/migrator {:keydox/doc "...", :keydox.doc/format :plain-text}}}

It's just a bag of unstructured keys, with no way to distinguish minor keys that are just implementation details from major ones that people want to look for.

If we could namespace them by project name (through resources path) that would be solved, right? We could also support #ig/ref in the config to be able to reuse simple keys in component keys, e.g.

{:my/component
 {:keydox/doc "..."
  :integrant.component/opts
  {:db #ig/ref :db}}

 :db
 {:keydox/doc "Database connection"}}

However we'd have to allow simple keywords, which is fine if by default we group them by project.

I think the keydox project should have the self documentation, but also some "framework" tooling to help people build new tools. e.g. a function to get all configurations with their namespace. A simple doc function to check documentation in the repl. Possibly other things. But nothing to complex in the end. The end goals is as you said, enabling other people to create tooling around this (e.g. codox integration).

RickMoynihan commented 3 years ago

@kwrooijen Sounds like we're very close to agreement here.

There’s just a few things you’ve said that we should clarify. Before we get to those details though…

1.

I totally agree the project should provide some tooling, I was just thinking it would be done in a separate repo/project. i.e. the vocabulary would be separate from any tooling we’d provide. The vocabulary would be just the self description bundled in the keydox format written in terms of itself; and I guess a human readable specification / README etc. Obviously at some point that document specification could in principle be generated from the keydox.edn file itself.

2.

I had the same idea about using the resource path to disambiguate the definition; and I was thinking tooling we provide should definitely assoc in the URL to the file that defines each key; indeed it should perhaps also if possible (iirc a parser like rewrite-cljc could provide this) provide access to the line numbers in that file for the key definitions etc. I feel like these things should be provided extensions rather than core vocabulary terms though.

There are definitely trade offs about adding project keys and nesting definitions under them, however I feel like it would be simpler to just keep it one level deep, and to reference another key e.g.

{:my/key {:keydox/doc "...", :keydox/defined-by :my.project}

 :my.project {:keydox/doc "My project to demonstrate how to represent projects in keydox."
                     :project/website "https://github.com/kwrooijen/keydox"} ;; another vocab defines this key
}

I actually think think this is quite a good idea, however it does perhaps then introduce a new concept of key types. At this point we are a hairs breadth away from reinventing RDF (we have types and properties) 😁 so could perhaps also have keydox equivalents for rdfs:domain and rdfs:range, e.g.

{:keydox/doc {:keydox/doc "...", :keydox/domain :keydox/DocumentDefinition :keydox/range clojure.core/string?}
 :keydox/Project {:keydox/type clojure.core/keyword? :keydox/doc "The project defining keydox metadata"}
 :keydox/DocumentDefinition {:keydox/type clojure.core/keyword? }

I'd propose symbols are not resolved, and are just fully qualified symbols or a keyword pointing to something that could identify the type. i.e. types at this layer would perhaps be considered equal if they're the same lexical symbolic value.

At this point it makes a lot of sense to move metadata like :keydox.doc/format onto the definition of the :keydox/doc key rather than bodging it into the keydox resource map. i.e. whether the format was markdown or plaintext would be defined on the property/key like this:

{:keydox/doc {:keydox/type clojure.core/string? :keydox/string-format :keydox/markdown}
}

The complication this will expose is that you'll then want to do inference on types, and support sub property and type hierarchies, so you can essentially define a range of either plain-text or markdown. I think perhaps having a tagged reader for markdown here might be better for our needs here, rather than defining a predicate function for (or (markdown? %) (string? %)).

Again I think this sort of thing is genuinely super useful (see RDF for why) but it might be a bit much for some people to swallow!? Still it might be worth exploring further. IIRC the arachny framework also came to similar conclusions, about essentially using RDF to model stuff like this; though I believe that's largely defunct.

The main thing domains and ranges would add is an ability to infer types on the end of a keyword, so end users wouldn't need to explicitly specify them. A keydox normalisation phase would then infer them all. A few subtleties we'd want to handle might then emerge, like the fact that we might have multiple cardinalities of properties; as several keywords could define different domains and ranges.

3

Finally regarding markdown yes it's common for markdown to be used in clojure doc comments, but what subset of markdown etc is a problem. Similarly clojure.core doesn't do it; and there are other things, like it would be super nice to provide affordances for users mentioning another keyword in a string and hot linking it. Knowing when to opt in/out of these behaviours is I think potentially important. I'd propose keydox does not literally support markdown, but a well defined subset with one or two clojure specific tweaks e.g. extra support for processing inline :keywords etc. Some folk may want a different subset for some uses, so giving them a way to hook in via tagged readers or a dispatch on keyword seems like it might be handy.

kwrooijen commented 3 years ago
  1. I'm still not sure what your reasoning is to have the "whitepaper" and the tooling separate? If the tooling implements the whitepaper, wouldn't it just introduce more complexity (2 dependencies) and be more error prone? (e.g. end-user using new whitepaper, old tooling).

2.1 I'm really having a hard time thinking of a good solution for nesting issue. The main reason being that if we could simply (smart) merge all the configurations, it would eliminate a lot of complexity. Maybe having the top level key be a project key could be worth it? We could even create an Integrant component which handles it for us. I personally am not a fan of these reverse references by :keydox/defined-by.

{[:keydox/project :my/project]
 {:keydox.project/doc "My project to demonstrate how to represent projects in keydox."
  :keydox.project/website "https://github.com/kwrooijen/keydox"
  :my/key {:keydox/doc "..."}}}

Using composite keys we could also reference [project key] keys, after merging all the configurations e.g. #ig/ref [:my/project :my/key]. Since they're all nested in a project keys, there shouldn't be any conflicts. Personally I think we could gain a lot of flexibility and other benefits from Integrant, but maybe that would make it more complex for the end-user?

This is what I imagine the "referencing" map to look like. The first map being all keydox.edn files merged, the second one being a collection of [project key]s. This would make referencing very easy, and retains uniqueness.

;; Merged keydox.edn files
{[:keydox/project :my/project]
 {:keydox.project/doc "My project to demonstrate how to represent projects in keydox."
  :keydox.project/website "https://github.com/kwrooijen/keydox"
  :my/key {:keydox/doc "Doc about my key"}}

 [:keydox/project :duct/core]
 {:keydox.project/doc "Duct project"
  :duct/migrator {:keydox/doc "doc about duct migrator"}}}

;; Merged [project key] keydox.edn. With this map the end-user would be able to reference cross project
;; e.g. `#ig/ref [:duct/core :duct/migrator]`
{[:my/project :my/key] {:keydox/doc "doc about my key"}
 [:duct/core :duct/migrator] {:keydox/doc "doc about duct migrator"}}

2.2

As for typed keys and typed arguments, maybe using Malli would be a good solution? I think it would be better to re-use existing libraries rather than reinvent the wheel.

3.

I guess the tooling should decide on the type of markdown, and people can implement their own. I like the key linking idea, you'd probably still need a composite key system for that using [project key] as well. If you don't they you'd reference multiple keys, which can't be desirable.

RickMoynihan commented 3 years ago

I know I’ve went off on one here with the RDF inspired model, but I think following RDF represents one extreme on the spectrum of options, so figured it was worth highlighting what the logical extension of that might look like. RDF is very well designed in this area though, and shows what I think a fully decomplected option might look like. It would however have implications, for example everything essentially becomes highly denormalised graph data; you'd essentially want to query it easily... so you'd probably want to use something like datascript, or matcha (a small project of mine).

I think it would avoid some complexities introduced in your model, for example the need to introduce a :keydox.project/doc key in addition to a :keydox/doc key, here you're complecting types with properties, rather than treating them as orthogonal. Don't get me wrong, it's a trade off will bring some advantages, but it means the vocabularies are no longer as minimal as they could be.

I don't think we should add an integrant dependency, for a number of reasons; but the main ones are that the documentation is declarative, the definitions are data, not config to be instantiated. Also pragmatically #ig/ref's can't represent cycles; yet cycles, dictionaries, thesauruses and documentation are common.

So if we wanted to support referencing I think it should be done through its own #keydox/ref tagged literally. Incidentally we should probably read keydox.edn files with :default tagged-literal set, so other people can use tagged literals without us having to know about them.

I'll have a think about how stealing the composite keywords idea might work to support specifying projects.

If you don't they you'd reference multiple keys, which can't be desirable.

I had tried to solve/avoid this issue when I said this:

I don't think they should be meta merged though; instead I think all resources should be found and iterated over. This would allow potentially many definitions for the same key from different vendors (resource paths).

i.e. the idea is that it's not a problem worth worrying about, if multiple people document something, you'd just show people all the documented definitions and let them pick; with metadata to disambiguate who said it (notably the resource path and project)

Regarding the RDF inspired model, I could probably prototype/spike something pretty easily to demonstrate what it would look and feel like. I suspect it'd be pretty easy for me as I'm very familiar with RDF, and even wrote a library API I could use for querying this stuff (not saying we need to shoe horn that in btw, just saying I could prototype something easily with it) to demonstrate what it might look like. I feel any other model would at this stage be harder to design/prototype, or risk making trade offs I'm uncomfortable with.

The problem is I don't have much free time at the moment; but it's piqued my curiosity enough that I might well give it a try if I can find some time.

Happy to try and work up some competing alternative options though, as I'm not entirely convinced the RDF-like model is right for the clojure ecosystem, and something easier or more familiar might well be more appropriate.

kwrooijen commented 3 years ago

It's a bit difficult for me to judge / picture the RDF model, since I have no experience with RDF. The reason for the composite keys is because if you reference a key in your generated doc, you would get a list of keys, instead of the key that you actually want. If you want to write up something in your RDF style maybe it would be clearer for me. But that's up to you.

RickMoynihan commented 3 years ago

If you want to write up something in your RDF style maybe it would be clearer for me. But that's up to you.

Yeah that's what I'd like to do, as I appreciate on the surface it seems a bit abstract. It would I think certainly be the most flexible model, however the flexibility does certainly introduce some extra cost. As I said I'm not sure the cost is worth it, but it makes a lot of sense to me. We definitely want something that's got a chance of wider adoption. I think something based on a graph data model could do that; but it equally might be too much for some to stomach; which is why I'm very open to also working up some competing proposals.

The reason for the composite keys is because if you reference a key in your generated doc, you would get a list of keys

Yeah I know where you're coming from, however I think there's a distinction between the key and the documentation about the key which are getting slightly confused in your presentation of this model.

I was thinking that a query for a key, doesn't return you the key, but descriptions of that key, of which there might be many. By way of an example[:github/kwrooijen :duct/migrator] is your description of the key :duct/migrator, [:github/rickmoynihan :duct/migrator] would be my description of it. Eventually when duct officially supports keydox weavejester might add [:github/duct :duct/migrator].

Both would be attempting to describe the same thing, and yes they might conflict but if we've already acknowledged this is an open system then we can't stop people doing this. Obviously your description is distinctly identified from mine though, so it can be directly linked to when needed.

Obviously versioning may complicate things a little, so having some notion of the version may be useful; but if we assume the community are playing by Rich Hickey's speculation ideals, changes are accretiative, so it's not such a big deal.

I forgot to answer your point:

But you document the :doc key, aren't these conflicting ideas?

As you may have gathered now, no, I don't think these are conflicting ideas. Un namespaced keys are of course ambiguous, but by providing all documented definitions for them users can discern through context which definition is helpful. It's not ideal, but it seems it would still be useful to document these. e.g. we could add some definitions for conventions used by clojure.core :doc :added :deprecated :author. Picking :author as an example the :author docstring might read "Used by clojure.core to specify the namespaces principal author`... The fact that a blogging app would add another domain definition doesn't seem to me to be the end of the world.

RickMoynihan commented 3 years ago

I'm still not sure what your reasoning is to have the "whitepaper" and the tooling separate? If the tooling implements the whitepaper, wouldn't it just introduce more complexity (2 dependencies) and be more error prone? (e.g. end-user using new whitepaper, old tooling).

Essentially it's because data lives forever but software tools don't. The complexity cost is I think minimal, and it would mean IDE's etc could include our documentation vocabularies and definitions without having to pollute their dependencies with our tooling dependencies. Making it optional whether they get/use our tooling will I think help motivate adoption in various environments. The core vocabulary should I think be pretty minimal, which also means there's less for people to object too.

Regarding newer whitepaper old tooling, that won't be an issue if we follow a model of accreting changes only. It does of course mean that getting the core right needs some thought and patience and wider scrutiny, however if the core is small enough that doesn't feel too ambitious.