lpw25 / opam-doc-base

Store documentation for OPAM packages
4 stars 2 forks source link

Inlining control (module includes and polymorphic variants) #70

Closed dbuenzli closed 4 years ago

dbuenzli commented 10 years ago

It would be nice if there was a directive to be able to control inlining. For example:

include M
(** @inline *)

Sometimes you want it and sometimes you don't.

Example where I want it: when I gather internal modules into another one to define the public api of a library. My current strategy with ocamldoc is to not document the internal modules and write everything in the gathering module.

Example where I don't want it. When I have a module that I expose in different manners. For example there's a public API for users and an private public API to define backends that allows to access more information about the data structures. Here's an example public, private public. In that case I find it more informative to not inline as it give a clear view to what the private api grants you access.

Maybe directive could be used to control the inlining of polymorphic variants:

type t = [ bla (** @inline *) |  `C ]

The default could also be to always inline and have a @noinline directive I don't have strong opinions on that (I think the good default would be no inline for polyvars and inline for module includes but that would require the introduction of two directives).

lpw25 commented 10 years ago

All reasonable suggestions.

I think the plan for includes is to have them expandable/contractable so that users can show or hide the contents as they see fit (perhaps a similar thing would be nice for polymorphic variant types), but providing more subtle control than that to documentation authors seems like a good idea.

dbuenzli commented 10 years ago

Le mercredi, 22 octobre 2014 à 17:48, Leo White a écrit :

I think the plan for includes is to have them expandable/contractable so that users can show or hide the contents as they see fit (perhaps a similar thing would be nice for polymorphic variant types)., but providing more subtle control than that to documentation authors seems like a good idea.

I guess these are backend issues. But

1) When I say inline I'd like to actually hide the fact that there's a include M here.

2) In that particular case the expandable/contractable doesn't work well from a HCI point of view. The reason is that you want to be able to easily address (href) a specific expanded/contracted state and not have to rexpand/recontract each time you land the page to be able to get to the state you want. Sure you can do tricks with # and JavaScript but for documentation of that kind nothing beats a collection of well linked raw html files.

Best,

Daniel

dbuenzli commented 10 years ago

Basically what I want to say is that the way ocamldoc does it now is perfect, except sometimes rather than have the linked include M you want that to be replaced by the contents of the linked page.

Best,

Daniel

dsheets commented 10 years ago

I'm thinking about inlining slightly more (nested, inline modules/signatures 1 deep) and removing the raw interface files. What do you think? Are the raw signatures valuable?

dbuenzli commented 10 years ago

About inlining more, I'm not convinced. For most of my packages this would mean that you get all the package documentation on a single page and that would be completely annoying navigation wise. See e.g. gg.

I really like the addressability/clickability at the module level since it's a natural organisation principle. It also makes the urls hackable and allows OS search tools/launchers that work at the file level to quickly find the documentation of a module (that's the way I currently access ocamldoc generated documentation).

As I said I think what ocamldoc does is perfect except I sometimes want to hide the fact that there's an include M, that's all. I'd hate an interface were you have to collapse/uncollapse things it would be a huge waste of my time. Also these kind of things don't work well with the search on page functionality browsers.

About raw signatures I don't really care, I did sometimes look at them but it was more when something was broken about ocamldoc's output (but if you want to keep that as a gate to a full source code browser why not...).

lpw25 commented 10 years ago

As I said I think what ocamldoc does is perfect except I sometimes want to hide the fact that there's an include M, that's all. I'd hate an interface were you have to collapse/uncollapse things it would be a huge waste of my time.

There are plenty of libraries -- especially Core -- where ocamldoc's current behaviour is clearly wrong. It makes the documentation very difficult to navigate and does not give a clear indication of what elements a module actually contains. This is why I think that includes should be expandable/collapsible and expanded by default.

In the case of nested modules, I think that it makes more sense to have them be collapsed by default.

However, as I said above, I am very happy with giving developers more control over their documentation through tags like @inline since what makes sense for one module/library does not make sense for all of them.

lpw25 commented 10 years ago

2) In that particular case the expandable/contractable doesn't work well from a HCI point of view. The reason is that you want to be able to easily address (href) a specific expanded/contracted state and not have to rexpand/recontract each time you land the page to be able to get to the state you want. Sure you can do tricks with # and JavaScript but for documentation of that kind nothing beats a collection of well linked raw html files.

To be clear, I still think we should provide the separate addressable pages for the contents of includes and submodules in addition to the support for expanding them inline with their parent.

dbuenzli commented 10 years ago

Le mercredi, 22 octobre 2014 à 19:21, Leo White a écrit :

There are plenty of libraries -- especially Core -- where ocamldoc's current behaviour is clearly wrong. It makes the documentation very difficult to navigate and does not give a clear indication of what elements a module actually contains.

Yes absolutely. I'm not sure you understood me. There are two things to distinguish on a page:

1) A module inclusion include M 2) A nested module module A on the page.

I think that for 1) we simply want to ability to decide whether that include M is going to:

a) Be a link to another page (current ocamldoc behaviour)
b) The textual inclusion of the interface at that point and recursively (the fact that there is an include M is completely hidden).

I think that for 2) we always want that to be a link to a new page (David was suggesting that he wanted to maybe inline them one level deep).

So what I say is that except for the fact that b) is currently not possible with ocamldoc, the structure it generates is a good navigation structure.

This is why I think that includes should be expandable/collapsible and expanded by default. If we agree with the above there's absolutely no need for an expansion/collapse widget thingy at all. We just have a good and well linked hypertext.

Best,

Daniel

P.S. Another disadvantage of the expand/collapse things is that this doesn't make it in your browser history so it makes it harder to navigate both spatially, the structure of the page changes according to your clicks and you can't use your browser back button to get to a previous known state.

dsheets commented 10 years ago

Here is my current design proposal:

  1. Polymorphic variants are not inlined by default
  2. Module inclusion is not inlined by default and is linked to the source module stand-alone page
  3. Nested modules are inlined according to some heuristic (size?) but still linked to a stand-alone page
  4. Docs are easily browsed in systems without CSS or JavaScript

With 1 and 2, I think we should allow authors to opt into inlining through a doc tag. In any case, the inlined component will always be available in a document by itself (for the default HTML generator).

For 2, I think we can devise a reasonable (and simple) heuristic that produces useful documentation in the vast majority of cases. I really hate having to drill into tiny nested modules to see the whole signature of a module with children. I believe we should allow authors to opt out of this, though, if the heuristic hurts them.

I propose

(* @inline *)

to force a substructure to be inlined and

(* @inline never *)

to convey the desire to have the substructure inlined under no circumstances.

As for the heuristic, I propose we inline any 1-deep substructure that will add no more than 10% more elements compared to the root level. We could argue over the exact threshold (5%) and probably, if adopted, we'll have to tweak this number empirically. I believe, though, that this functionality is very useful because many nested modules are tiny and include only a type signature and a single function or include another module (with constraints or substitution). Looking at how core behaves with this heuristic should help use design a system with good usability.

With systems that use nested modules to effect namespaces, we could refuse to inline anything if the total increase in page elements is more than 2x (another parameter to tweak). This would allow namespace modules to continue to act as de facto indexes while still inlining helper modules used to structure a specific piece of functionality. The design of this heuristic is conservative with behavior falling back to the ocamldoc behavior.

With this design, we don't have to decide on the exact semantics of the lack of an inline directive as this is the prerogative of any individual generation system. We will instead be promulgating a system whereby authors have a way to make their desires regarding inlines of substructures explicit.

dbuenzli commented 10 years ago

What you propose seems mostly fine to me however I have two comments.

On 2. Let me explain how I would really like to be able to define my APIs but have always been prevented from either because the documentation wouldn't look good and/or because it wouldn't hide enough names.

The basic idea is that I want to define APIs using a gathering module, see e.g. gg, react, uucp and I want the documentation of these libraries to look as it stands now and potentially using module aliases to define them without the aliased named showing up in the documentation. To achieve that in the current system I have to define things as follows, let's take the example of a library with a module A that I gather in a module Lib.

--- Lib_A.mli ---
val type t
val create : unit -> t
--------------------

--- Lib.mli ---
(** Library Lib *) 

module A : sig 
  type t 
  (** The docs *) 

  val create : unit -> t 
  (** The docs *)
end
------------------

--- Lib.ml ---
module A = Lib_A
----------------

Now the only thing I want to export to the user of the API is lib.cmti. But the problem and annoying thing with this approach is that a) I have to repeat twice the module signature of Lib_A and b) that it doesn't work with module aliases as it will leak names that I would like to keep internal (see this discussion).

So if there was a way of being able to generate the same documentation (i.e. only the names of Lib are visible) but using the following definitions and that it would also work with module aliases

--- Lib_A.mli ---
val type t
(** The docs *) 

val create : unit -> t
(** The docs *) 
--------------------

--- Lib.mli ---
(** Library Lib *) 

module A : sig
  include module type of Lib_A
end
------------------

--- Lib.ml ---
module A = Lib_A
----------------

That would make me very very happy. Btw. I didn't release uucp with module aliases support so far only because I couldn't get a good documentation set.

On 3. The inlining of nested modules I'm fine if it can be prevented. But OTOH are you sure that what you describe is that frequent (i.e. do we want nested module inlining at all) ? For example if you take the modules at the end of this document section most of them contain only a few definitions but I wouldn't like them to be inlined. I think that the way it currently renders gives a better outline of what is available while retaining a fast access to the information.

dbuenzli commented 10 years ago

In essence what I'd like to say is that the module system gives us way to define an API in a DRY way but I'm somehow prevented of using them because it would generates useless bureaucracy at the documentation level (names that should be hidden that aren't, links to things that shouldn't be linked to because they should be hidden, etc).

lpw25 commented 10 years ago

If we agree with the above there's absolutely no need for an expansion/collapse widget thingy at all. We just have a good and well linked hypertext.

The problem with this is that the fact that something has an include and the contents of that include are both important pieces of information which should be clearly visible in documentation. For example:

module type Comparable = sig
  type t
  val compare : t -> t -> t
end

module Foo : sig
  [...]
  include Comparable with type t := t
  [...]
end

Here we want to expose both that Foo : Comparable and that Foo.compare exists. This requires us to show all the elements of Foo and to wrap the ones from Comparable in something that indicates they came from Comparable. At this point users will naturally assume/desire that they can collapse the Comparable part.

P.S. Another disadvantage of the expand/collapse things is that this doesn't make it in your browser history so it makes it harder to navigate both spatially, the structure of the page changes according to your clicks and you can't use your browser back button to get to a previous known state.

Not my area of expertise, but I believe these things to be solved problems in modern JavaScript land.

dsheets commented 10 years ago

I believe you could achieve almost what you describe under the proposal I described upthread (based on your original suggestion).

Something like:

module A : sig
  include module type of Lib_A
  (* @inline *)
end
(* @inline never *)

As you wouldn't install lib_a.cmti, Lib_A would not be linked in the generated documentation. In my proposal, the fact that the signature of module A contained an included module type of Lib_A would still be visible, though. Does this satisfy you? I foresee a desire to generate project development documentation where the actual interface of Lib_A would be documented and linked from this reference. If you really want to have control over the visibility of the inlined constructs, we could also define something like @inline hidden or @inline total.

As for 3, it's not clear to me that we need any sort of inlining heuristic in the first instance. It is clear, however, that such a thing could be constructed and that it is ultimately the decision of the generation and rendering systems whether to inline or not. This is why we should define the capability for authors to be explicit regarding the behavior they desire.

With that said, I am interested in experimenting with these inlining heuristics because there are some design patterns that would benefit from this sort of system. Of course, one can argue that authors should always be explicit about their decision (or systems should only default to a single, obvious behavior) but I believe this doesn't take into account the disinterest many authors and organizations have in creating perfectly crafted documentation. As to your react example, I believe that those nested modules would be rejected for automatic inlining by the upper bound threshold I mentioned above. Specifically, Int and Float together contain more definitions than the parent module and so no nested module would be inlined.

A design pattern where the heuristic makes sense in my opinion: https://ocaml.janestreet.com/ocaml-core/111.28.00/doc/core_kernel/#Bag. In this example, I would opt to inline the Elt module and link to Container.S1 by default. If the maintainer later decides that they want to have Container.S1 included in the outer module, a simple (* @inline *) would suffice to both include and reference the interface. This would change the calculation for renderers that inline "small" nested modules by increasing the number of root definitions and thus potentially causing un-annotated nested modules to also be inlined and referenced (if their totality is still small enough).

lpw25 commented 10 years ago

Nested modules are inlined according to some heuristic (size?) but still linked to a stand-alone page

I'm a little nervous about using a heuristic since it means that a small change to the module may can change its documentation in a way that surprises developers. I would probably rather just pick one thing and do it always unless instructed otherwise.

Of course this matters less if the page is a bit dynamic (e.g. with expand/collapse) since then it is not as important to make the "right" decision.

lpw25 commented 10 years ago

As you wouldn't install lib_a.cmti, Lib_A would not be linked in the generated documentation.

If lib_a.cmti is not installed then we will not be able to display (all) the comments from lib_a.mli

lpw25 commented 10 years ago

It seems to me that you should be able to write something like:

(* Lib_A.mli *)
(** @hide *)

val type t
(** The docs *) 

val create : unit -> t
(** The docs *) 
(* Lib.mli *)
(** Library Lib *) 

(** @inline *)
module A = Lib_a

And have it produce the following for Lib:

# Library Lib

module A : sig

  val type t
    The docs

  val create : unit -> t
     The docs

end

Possibly Lib_a should still be available in documentation but not listed in any indexes. Lib.A would also be available as a stand-alone page which looked like:

module A

  val type t
    The docs

  val create : unit -> t
     The docs
dsheets commented 10 years ago

This tool generates HTML and never inlines. This tool generates TeX and always inlines. This tool generates snippets of documentation to be put into blog posts and only inlines "small" things. A particular tool can choose some behavior. We have to give authors the ability to explicitly indicate a behavior. If they don't indicate a behavior, it's up to the policy of the tool which could be heuristic.

The annoyance I am trying to solve is the problem of tiny nested modules/signatures that have to be clicked into to understand what a module is offering but are almost worthless out of the context of the surrounding module. I believe that a large class of these is easily detectable with a couple of conservative parameters and their inclusion would greatly improve the user's experience. If authors demand control, they should be explicit.

Also, I agree with Leo's proposal for annotating private module interfaces rather than their inclusion sites.

lpw25 commented 10 years ago

This tool generates HTML and never inlines. This tool generates TeX and always inlines. This tool generates snippets of documentation to be put into blog posts and only inlines "small" things. A particular tool can choose some behavior. We have to give authors the ability to explicitly indicate a behavior. If they don't indicate a behavior, it's up to the policy of the tool which could be heuristic.

Completely agree, but heuristics other than "we default to X" tend to make me nervous. However, in this case it is probably fine to have something more complicated.

dsheets commented 10 years ago

I agree with your nervousness regarding over-smartness of tools. I think we should plan to expose the proposal I outlined (without the heuristic) and your @hide proposal. Then, once we get bulk doc generation up and running, we should experiment with these kinds of documentation analysis algorithms and watch which docs are affected as we play with these parameters. My strong suspicion is that there are design styles used by large, minimally documented codebases that would really benefit from this kind of analysis. When we do this, we can also get some idea of the number of "false positive" doc improvement attempts. The false negatives don't really matter as they are the status quo.

lpw25 commented 10 years ago

I agree with your nervousness regarding over-smartness of tools. I think we should plan to expose the proposal I outlined (without the heuristic) and your @hide proposal. Then, once we get bulk doc generation up and running, we should experiment with these kinds of documentation analysis algorithms and watch which docs are affected as we play with these parameters.

Sounds like a plan.

My strong suspicion is that there are design styles used by large, minimally documented codebases that would really benefit from this kind of analysis.

I wonder which codebase you could be referring to :smile:

dbuenzli commented 10 years ago

Le mercredi, 22 octobre 2014 à 22:56, Leo White a écrit :

Here we want to expose both that Foo : Comparable and that Foo.compare exists. This requires us to show all the elements of Foo and to wrap the ones from Comparable in something that indicates they came from Comparable. At this point users will naturally assume/desire that they can collapse the Comparable part.

Do we really wan't to know that Foo : Comparable exists ? We are not in a nominal type system ! Basically the day I want to use Make (C : Comparable) I'll just try and bitch if doesn't work…

From my opinion I'd rather say it's noise but I can understand you may want that. OTOH there are certain cases (e.g. mines) where the name is from an internal name that you don't want to expose, should we add another knob ?

Daniel

samoht commented 10 years ago

From my opinion I'd rather say it's noise but I can understand you may want that. OTOH there are certain cases (e.g. mines) where the name is from an internal name that you don't want to expose, should we add another knob ?

such as the include Foo_intf trick in most of the Foo.mli's core modules?

dbuenzli commented 10 years ago

Le mercredi, 22 octobre 2014 à 22:56, David Sheets a écrit :

As you wouldn't install lib_a.cmti, Lib_A would not be linked in the generated documentation. In my proposal, the fact that the signature of module A contained an included module type of Lib_A would still be visible, though. Does this satisfy you? I foresee a desire to generate project development documentation where the actual interface of Lib_A would be documented and linked from this reference. If you really want to have control over the visibility of the inlined constructs, we could also define something like @inline hidden or @inline total.

I think this would be desirable. Names are important and everywhere and we should not pollute the minds with names that shouldn't/can't be used.

Best,

Daniel

dsheets commented 10 years ago

Module types are structural, though, and developers often use these sort of signature tricks to enforce compatibility.

To my mind, each behavior (try to keep them few) that you want to enforce should have a means of expressing that demand. If you specify nothing, you leave the decision to the renderer. If you use one of our tag constructions (try to keep them few), then renderers should obey their semantics. We need to define their semantics (try to keep them few).

lpw25 commented 9 years ago

The hiding aspect of this proposal is now has its own issue: lpw25/doc-ock-lib#4. I think the inlining aspect should be handled by the actual HTML generation in ocamlary.