simpler doc links: `getFilePermissions <#getFilePermissions,string>`_ => $`getFilePermissions`

timotheecour commented 5 years ago

this PR https://github.com/nim-lang/Nim/pull/10492 adds a lot of great documentation how ever it also adds a lot of things like:

`getFilePermissions <#getFilePermissions,string>`_

which are required today by docgen to produce valid links. This is not DRY for 2 reasons:

the name is consistently repeated twice
the args are mentioned which is a different kind of DRY violation (args mention in source code as well as in that doc link)

other drawbacks:

this makes for less readable docs when browsing source code
it increases risk of links becoming out of sync when proc signature changes (eg an extra (optional) param is later added, which can often happen)
discourages using proper doc links as it adds a barrier to "get it right" (which explains why so few procs have these links)

EDIT: doc links are order-dependent:

proc fn*[T: SomeFloat](a: T)=discard
proc fn*[T: SomeInteger](a: T)=discard

generates:

fn,T
fn,T_2

proposal

add a DRY and easy to remember syntax to refer to a symbol (see below for concrete syntax)
let docgen resolve it to an actual doc link exactly as it would if user typed getFilePermissions <#getFilePermissions,string>_ in full
the only problem is when symbol is overloaded; there are 2 approaches
- make docgen issue an ambiguity error and require full syntax getFilePermissions <#getFilePermissions,string> or something a bit DRY-er getFilePermissions <string>
- make docgen resolve to the 1st symbol it finds; shouldn't be a big deal; overloads are generally placed next to each other and oftentimes only 1 is documented anyway;

in any case this is a practicaly tradeoff: we improve situation for the 95% case and make the 5% case well, as bad as it already is today

concrete syntax for the proposal

open to suggestions; maybe:

see also $`getFilePermissions` which is related # docgen will resolve to same as `getFilePermissions <#getFilePermissions,string>`_
see also $`strutils.foo` which is related # docgen will resolve to same as `strutils.foo <#strutils.foo,string,int>`_ # ie, can use modules in other symbols

benefits:

would fix https://github.com/nim-lang/Nim/issues/16337
broken links would give CT errors when running nim doc
links would be easier to read/write/edit
each link would give rise to an entry in the index
links wouldn't break when you change a proc to a func or add optional params
there are easy ways to handle overloaded procs in the same module, I can describe what I have in mind

ringabout commented 3 years ago

Yeah, I agree. Original doc links way makes proc + nosideEffect => func a bit annoying.

a-mr commented 3 years ago

RST has the mechanism of custom roles for "interpreted text" (between ``).

So proper RST-style syntax would be something like:

:nim:`proc getFilePermissions`
:nim:proc:`getFilePermissions`

We can parse the text inside `` to generate proper link so we can get close to actual definition of the proc:

:nim:`proc getFilePermissions(string)`
:nim:proc:`getFilePermissions(string)`

Probably we can also study how it's done in sphinx

cc @konsumlamm

timotheecour commented 3 years ago

@a-mr

Probably we can also study how it's done in sphinx

we can do better than requiring users to write :py:const: or :py:class:, because nim is strongly typed and nim has access to resolved symbols when running nim doc, in particular nim doc --project can be expected to resolve all references including ones outside a module (or even nim doc without --project thanks to index files; theindex.html is proof this can all work)

:nim:proc getFilePermissions :nim:proc getFilePermissions(string)

that's not DRY though, and still suffers from being sensitive to things like proc<=>func edits which would break links.

For nim files, all that's needed IMO is a single char to trigger link resolution. eg:

## See also: :`delete` # a reference to `delete` in same module
## See also: :`strutils.delete` # a reference to `delete` in a potentially different module

overloads

When a symbol is overloaded, IMO following is good enough:

## See also: :`httpclient.delete` # references 1st overload of `delete` in `httpclient`

docgen can, for overloads within a module, make it easy to "click to go to next overload". In the rare cases where showing 1st overload isn't good enough, we can extend the syntax, there are many ways, but this isn't needed in 1st version of this feature.

benefits

easy to read/write/edit such links in code
automatically generate warning when such symbols are not found
links are robust to future API additions such as adding params, adding overloads, etc

bikeshedding

I've tried to see which special char would work, see

criterion:

the links should use same syntax in rst and nim files
it should be as simple as possible, just 1 char to trigger resolving a link

conclusion: 2 options seem to render well in both nim and rst in github (note that nim rst2html and nim doc can always be customize as needed):

option1: `strutils.delete`_

option2: :`strutils.delete`

option 1 renders in rst github as a link that points to itself. This is fine since the main use case is nim doc and nim rst2html. Note that the linke can point to something if there's another definition somewhere, for eg:

see also: `strutils.delete`_.

.. _strutils.delete: https://github.com/nim-lang

but that's not the intended use (although it opens door for including an auto-generated rst file which would define all those links if we want to make those links work from github viewer).

option 2 renders in rst github as a code block preceded by : (so long .. default-role:: code appears on top, which is what we're going for anyways, see https://github.com/nim-lang/Nim/pull/17028); which also seems acceptable but a tad less visually pleasing.

conclusion

I prefer option 1. Any downside I haven't thought about?

links

timotheecour commented 3 years ago

after https://github.com/nim-lang/Nim/pull/17372, there's now another, possibly better option:

strutils.delete_

instead of:

`strutils.delete`_

example:

##[
foo1_

foo.bar2_

works with a comma separating links too:

foo.bar3_, foo.bar4_

`foo.bar5`_

`foo.bar6`_, `foo.bar7`_
##]

example of how it renders in github rst: https://github.com/timotheecour/vitanim/blob/3ae7daf173b4e074774af7062476beb28418de5b/testcases/tests/t10468b.rst#id11

timotheecour commented 3 years ago

@a-mr if you want to tackle this I'm happy to help, I think your previous contributions to docgen/rst (2 pass approach, underscore link support, sorting, etc) should make this more within reach; I'd be ok with the syntax strutils.delete_ which is as simple as it gets (simply adds an underscore).

For semantics we could do this:

if symbol is not overloaded, the hyperlink points to the symbol definition directly
if symbol is overloaded, the hyperlink points to the 1st overload; thanks to https://github.com/nim-lang/Nim/pull/18560, the other overloads will all appear in order right below. In the rare cases where we absolutely must point to a specific overload, we can devise a syntax for that later, but I don't see it as critical

a-mr commented 3 years ago

@timotheecour , I think I need to clarify 2 things first (note I know very little about the compiler). Let us assume that we are talking about `strutils.delete`_

is this new syntax going to "resolve" doc links to modules that are not imported? E.g. if strutils was not imported in the module being processed then docgen would know nothing about delete, right?
- if no then we should just mechanically fall back to some default HTML link that will be generated e.g. for every proc?
what is its interaction with modules system/paths? E.g. where is strutils searched?
- considering that there can be module with same name with different paths: should not we add possibility to create the same directory hierarchy as main code? I'm a bit bothered by the fact that currently koch docs dumps *.html to web/upload/1.5.1 (except the separate compiler directory). Also should not our link look something like `std/strutils.delete`_ then?

timotheecour commented 3 years ago

Let us assume that we are talking about `strutils.delete`_

shouldn't that be: strutils.delete_ ? (as enabled by your merged PR https://github.com/nim-lang/Nim/pull/17372)

eg:

# good:
## See also strutils.delete_
# bad:
## See also `strutils.delete`_

is this new syntax going to "resolve" doc links to modules that are not imported? E.g. if strutils was not imported in the module being processed then docgen would know nothing about delete, right?

yes, this shall resolve doc links that are not imported in current module (and in fact, this point should be irrelevant), so that in system.nim you can write:
```
proc compileOption(option, arg: string): bool {.magic: "CompileOptionArg".} =
## See also compilesettings.querySetting_
```
then docgen will transform this into a link (that may not exist yet, depending on how you run nim doc vs nim doc --project):
```
## See also <a class="reference external" href="compilesettings.html#_querySetting">std/compilesettings.querySetting</a>
```

if no then we should just mechanically fall back to some default HTML link that will be generated e.g. for every proc?

This indeed requires adding an additional "short-name anchor" for each symbol foo, eg:
```
<a id="foo,string,string"></a> # classical anchor
<a id="_foo"></a> # short-name anchor
```
(it's easy to generate 2 anchors per symbol) I'm using a prefix _ to disambiguate short-name anchors from other ones; it's an implementation detail as doc comments in user code shall use an intermediate syntax which docgen translates to those.

what is its interaction with modules system/paths? E.g. where is strutils searched?

this is yet another benefit of this RFC: it lets docgen handle resolving of external module links, so that you can write compilesettings.querySetting_ and then docgen transforms it to a correct href regardless of how you run nim doc (so long it's consistent obviously for all modules that are docgen'd).

for nim doc --project foo.nim (the newer way to build docs, which is used for building compiler docs, fusion etc), the layout follows the filesystem (and .. is transformed into _._ to avoid going out of outdir, refs https://github.com/nim-lang/Nim/pull/14454)
for nim doc foo.nim (which is used by stdlib, currently), layout is flattened

in either way though, docgen should know where to find a module relatively to another module (by inspecting whether user passed in --project, ie optWholeProject flag to see if it's not flattened, and also checking --docRoot)

What's needed is this API:

type DocMode = enum
  dmFlat # for `nim doc`, eg used by stdlib
  dmFilesystem # for `nim doc --project`, eg used by compiler docs, fusion etc

proc genRelativeLink(docMode: DocMode, toModule: string, fromModule: string): string = ...
  runnableExamples:
    assert genRelativeLink(dmFlat, "std/system", "std/compilesettings") == "system.html"
    assert genRelativeLink(dmFilesystem, "compiler/plugins/locals", "compiler/pathutils") == "locals.html"
    assert genRelativeLink(dmFilesystem, "compiler/pathutils", "compiler/plugins/locals") == "_._/pathutils.html"

note 1

for external modules, this should also be possible, but would need some way to register location of external docs in some place; this can be discussed in another RFC:

# we can use a clean syntax for external links:
## See also pkg/regex.findAllBounds_

note 2

it's simplest to do everything in terms of canonical paths (introduced in https://github.com/nim-lang/Nim/pull/16999), ie:

system/assertions, fusion/matching, std/tables

happy to discuss more

Araq commented 3 years ago

Please ensure backwards compatibility, links are shared and kept around for good.

a-mr commented 3 years ago

@timotheecour , thank you for very useful comment. I did not get all the details but hope to get into it in the work.

Regarding syntax: backticks are optional indeed, I wrote with back ticks only for disambiguation of cases pkg/regex.findAllBounds_. The little problem here is that in RST spec slash / is not included into allowed chars for short references. But it's not the first time when we ~~violate~~ creatively interpret RST spec, and the rst.nim parser can be fixed very easily to include / also.

Very first decomposition of the task looks like:

resolve links into current module
resolve them in imported modules
resolve across other modules

So I start from point 1.

timotheecour commented 3 years ago

Please ensure backwards compatibility, links are shared and kept around for good.

multiple anchors can be generated for a section/div/etc so that's fine

pkg/regex.findAllBounds_

this would be an option:

## see also: pkg.regex.findAllBounds_
## see also: std.strutils.split_
## see also: strutils.split_
## see also: split_

# instead of:

## see also: pkg/regex.findAllBounds_
## see also: std/strutils.split_
## see also: strutils.split_
## see also: split_

(i don't care too much either way) note that we should also allow links from rst (eg manual) to a module and vice versa; this should be discussed; from manual to a module should be able to reuse syntax (canonical path), eg: ## see also: std.strutils.split_

the other direction (from module to an anchor in rst) could use something similar, eg:

## see also manual.myanchor_
# or maybe use a top-level rst prefix:
## see also rst.manual.myanchor_

Very first decomposition of the task looks like:

sounds good

resolve them in imported modules

why is that needed and how does it differ from resolve across other modules ? eg, ## see also strutils.split_ should work from system.nim in same way as ## see also system.delete_ from strutils (even though strutils imports system but not vice versa)

Varriount commented 3 years ago

Whatever conclusion is reached here, I would just like to point out that future changes need to be accounted for. For example, it may be that in the future, all procedures which share a name but have different arguments will be grouped under a section with an anchor link.

a-mr commented 3 years ago

a funny fact that in some cases the concise links already work:

for non-overloadable things like consts, types...
for procs without input arguments

An example:

## Ref. f_

proc f*(): int = 2

Nim prints warning that it cannot find the link, but the fall-back link is href="#f", which does what we want.

timotheecour commented 3 years ago

interesting.. that means that simply generating extra anchor with the desired form (pointing to 1st overload) should "just work" at least for symbols within same module

for symbols within other module, this still requires a bit more work; I think the 1st part (links within same module, ie non-fully-qualified) could be it's own separate PR, it's a common case

a-mr commented 3 years ago

I think we need to be able to reference a concrete instance and the whole group of same-name symbols.

I propose the following syntax (for using inside module strutils):

procs
- for whole group of procs: split_ — most often case and hence short. It will be shown as split proc in the text — note suffix "proc" in the end having been added automatically.
- for a concrete proc: `split(string, char, int)`_. Will be shown as split(string, char, int) proc. One can use explicitly "proc" marker in the link, so `proc split`_ will turn into proc split and `proc split(string, char, int)`_ into proc split(string, char, int). The "proc" marker can be in the end of string also.
iterators, templates. Basically the same but explicit "iterator" or "template" markers are required. So short syntax will not be allowed:
- for whole group use `iterator split`_ syntax and it will turn into iterator split link to the group. The "iterator" can be put after again: `split iterator`_ -> split iterator.
- for a concrete iterator use `iterator split(string, char, int)`_ syntax
consts, types — short syntax should be allowed as compiler forbids name clashes with procs, iterators, templates

Why require specification of symbol for iterators and templates? Because otherwise we would have ambiguity; and there could happen a situation that initially link pointed e.g. to iterators and then a proc with the same name was introduced and so link is pointing now to the proc without author even knowing!

timotheecour commented 3 years ago

for whole group of procs: split_ — most often case and hence short.

yes, this is what should almost always be used; whether a proc is split into multiple overloads or refactored into a single generic proc is often an implementation detail subject to change as code evolves

for whole group of procs: split_ — most often case and hence short. It will be shown as split proc in the text — note suffix "proc" in the end having been added automatically.

why not split routine or, even better, simply split it's a click away and IMO it should usually be clear from context whether it's an iterator or const/var/let or proc/func/...? given that we can have proc vs func vs template vs macro (etc) overloads for same name (and a proc can be changed to a template; this shouldn't break links) eg:

proc fn*(a: int): int = ...
=>
proc fnImpl(a: int): int = ...
template fn*(a: int): int = fnImpl(a)

iterators, templates. Basically the same but explicit "iterator" or "template" markers are required. So short syntax will not be allowed:

I don't think that's a good idea at least for the case of templates; simpler = better and symbol kind can change (eg proc fn => template fn that forwards to fnImpl or directly re-implements as a template, etc); these changes are common and should not break links.

iterators vs other routines

Note that allowing iterators to overload procs is considered a design mistake and is never needed anymore now that we have toSeq, which means you just need a foo iterator and just call toSeq(foo()) when you need a seq variant; stdlib still has a few such cases (split etc) but they're not that common and future code shouldn't add more of those.

so in the common case where iterator and procs aren't overloading each other, writing see myiter_ should be all that's needed because it'd point to the 1st overload in the iterators named myiter; in the rare case where iterator split overloads proc split, we can either allow specifying split iterator_ or simply just pick 1 overload (eg, prefer the iterator overload over the proc one)

grouping of symbols

I actually propose the following change; can be discussed separately from above points:

list all routines regardless of kind (iterator, templates, procs etc) in alphabetical order, so that iterator split will list near proc split; ditto for template foo vs proc foo overloads
but, still keep a reference to all iterators, templates, macros in the TOC; so that in the TOC an iterator foo will be listed twice:
```
## TOC
```
routines:
- split:
- iterator split(..)
- proc split(..)
- proc split(..)
- template split(..)
iterators:
- iterator split(..)
templates:
- template split(..)
procs/funcs:
- proc split(..)

main section with all symbols

... iterator split(..) ## description... proc split(..) proc split(..) template split(..) ...



> Why require specification of symbol for iterators and templates? Because otherwise we would have ambiguity; and there could happen a situation that initially link pointed e.g. to iterators and then a proc with the same name was introduced and so link is pointing now to the proc without author even knowing!

with my suggestion, this won't matter; you'd have a link to `split` which would point to the 1st overload (regardless of kind), and then all overloads would appear nearby; good enough for almost all cases and avoids breaking links, and simplifies writing links.

a-mr commented 3 years ago

@timotheecour ,

OK, you win, I agree with the 1st part that we should resolve split_ automatically to any kind of symbols. However:

we will emit a warning that there is an ambiguity
in that case we will also select the "winning" overload according to fixed priority "procs > iterators > templates" because people normally imply procs

Eventually we had better be able to point to a concrete overload so we need long syntax with allowed spaces like `split(string, char, int)`_ anyway. Of couse, points 1-2 will apply to this syntax too.

timotheecour commented 3 years ago

we will emit a warning that there is an ambiguity

at what time do you issue a warning? it's about whether the context is known to nim doc or not.

eg, if you call nim doc system.nim which has a comment see also strutils.split_ ; how can nim doc know that split_ could resolve to both iterator or proc, given that it's in another module it hasn't seen yet?

if you issue a warning when compiling nim doc strutils.nim, that means you'd systematically get a warning whenever an iterator overloads a proc, regardless whether you use the link split_

the warning can make sense if all the context is known at the time of nimdoc, eg if see also split_ is used within strutils (pointing to same module), or, maybe in other cases with nim doc --project

in that case we will also select the "winning" overload according to fixed priority "procs > iterators > templates" because people normally imply procs

that's fine, and a rare case anyways (iterator overloading a proc), and if symbols are listed alphabetically regardless of kind, it's really not a big problem

split(string, char, int)_ anyway

for the explicit overload syntax, we could require the symbol kind, eg:

iterator split(string, char, int)_

a-mr commented 3 years ago

at what time do you issue a warning?

I've thought that through only for local resolution — I will issue a warning in rst 2nd pass (finishGenerateDoc). Regarding different-module symbol I'm not yet sure but I guess it does not matter, I will need to retrieve all the symbols of that module anyway so it can be done at any stage (including finishGenerateDoc)

timotheecour commented 3 years ago

oh actually there's a really simple solution to this problem: we can rely on index files (or reuse/augment the one that's generated via nim doc which IIRC implies --index; preferably json format) to list all the references there as well as the anchors generated; then nim buildIndex (called after all the nim doc foo commands) can issue warnings for each problem found:

# foo.nim:
## see bar.baz_
## see bar.baz2_
## see bar.baz2(int, float)_

# bar.nim:
iterator baz(): int = discard
proc baz(): int = discard

nim doc foo # generates htmldocs/foo.idx
nim doc bar # generates htmldocs/bar.idx
nim buildIndex # generates htmldocs/theindex.html from all the idx files

nim buildIndex will now report the following warnings:

bar.baz_ should be disambiguated as either iterator bar.baz_ or proc bar.baz_
bar.baz2_ link reference invalid, did you mean xxx ?
bar.baz2(int, float)_ link reference invalid, did you mean xxx ?

note 1

nim buildIndex is an existing command that's already called to generate theindex.html; this simply enhances what it does to also report link inconsistencies

note 2

it will work the same regardless of separate compilation (nim doc foo followed by nim doc bar) or joint (via nim doc --project all), and regardless of same module references vs cross-modules references; no need to distinguish those cases (simpler!)

note 3

this will allow fixing https://github.com/nim-lang/Nim/issues/16337 by reporting all broken links automatically

a-mr commented 3 years ago

sounds like a great idea. But I cannot figure out how to use this command:

$ bin/nim buildIndex lib/pure 
Hint: used config file '/home/amakarov/activity-shared/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/config.nims' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/nim.cfg' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/config.nims' [Conf]
Hint: used config file '/home/amakarov/activity-shared/Nim/config/nimdoc.cfg' [Conf]
fatal.nim(53)            sysFatal
Error: unhandled exception: options.nim(651, 3) `not conf.outFile.isEmpty`  [AssertionDefect]

docgen.rst says it accepts a directory.

I know alternative: for the compiler we can run bin/nim doc --project --index:on compiler/main.nim but what about stdlib...

timotheecour commented 3 years ago

for separate nim doc invocations, see how it's done in kochdocs; this works (but needs better documentation, PR welcome):

nim doc --outdir:/tmp/d19 --index lib/pure/strutils
nim doc --outdir:/tmp/d19 --index lib/pure/os
nim buildIndex -o:/tmp/d19/theindex.html /tmp/d19

a-mr commented 3 years ago

During implementation/writing a test I found a problem with interpretation of spaces and case sensitivity. I'm implementing it by adding additional anchors from docgen.nim.

According to RST spec, links are case-insensitive and any number of whitespace is equivalent to one space character. Current rst.nim takes that into account.

This opens a question how to deal with links like SortOrder_ and `fill(openArray[T], T)`_. Changing openArray -> openarray is harmless but it's not so for SortOrder -> sortorder and T -> t. Also what to do with a space after comma (openArray, T)?

My current plan is to normalize both references from the rst.nim side and generated anchors from docgen.nim side in this way:

any spaces before and after punctuation — ,, (, ), [, ] — are deleted.
tokens inside this references/anchors are normalized the Nim way — first char is preserved — in these cases:
- if they are at the very 1st char of a reference/anchor
- if they are after keywords like proc and const
- if they are after the mentioned punctuation signs

So the examples above become Sortorder and fill(openarray[T],T). Both points are not exactly RST spec-compliant but what else can be done?

a-mr commented 3 years ago

It seems that having a separate set for docgen-generated anchors in rst.nim is the way to go. Then we can match references with anchors by dedicated rules independently.

timotheecour commented 3 years ago

can you clarfiy?

will the following work?

type SortOrder* = enum k0, k1
proc sortOrder*() = discard
proc foo*[T](a: int, sortOrder: SortOrder) = discard
proc foo*[T](a: int, sortOrder: SortOrder, c: int) = discard
proc foo*() =
  ## See: SortOrder_ (will link to type)
  ## See: sortOrder_ (will link to proc)
  ## See: foo_ (will link to 1st foo overload)
  ## See: `foo[T](int,SortOrder,int)` (will link to 2nd overload)

can we preserve case as describe above, even at expense of RST compliance (makes it easier to copy paste from code)? what would break?

if links generated cannot distinguish between SortOrder_ and sortOrder_, we can think about an internal encoding for caps, which would be hidden from users (ie references in docs would not need the encoding, it'd be generated automatically)

sortOrder_ => sortorder_
SortOrder_ => ^sortorder_

but i don't think it's needed, is it?

if you have a WIP PR i can test against, this would clarify things

a-mr commented 3 years ago

yes, it will work more or less. sortOrder and SortOrder will resolve OK. The other 2 cases with work the following nuances:

foo_ is actually referring not to first overload but to whole group of procs foo (it will add a wrapper group <div id="foo-procs-all">, likewise for all same-name overloads)
last case should be spelled as `foo(int,SortOrded,int)`_

If you meant using T inside input parameters like

proc foo*[T](a: int, sortOrder: SortOrder, c: T) = discard

then the link should be spelled as `foo(int,SortOrder,T)`_ without any [T]. It's because that's how anchors are preserved, you can check that the generated anchor id = foo,int,SortOrder,T. So T in parameter is just inserted literally. I guess this cannot be changed without breaking reverse compatibility or introducing extra level of wrappers.

Regarding copy-pasting: I don't expect supporting parameters names since by the same reason they are redundant, wdyt?

I'm not sure what to do with [T], I did not expect to add any semantic checking in incorrect cases like `proc foo*[U](T)`_. So everything I can do for this support is to delete [T] after after foo* in my internal anchor.

I'll try to send the PR tomorrow.

timotheecour commented 3 years ago

can foo,int,SortOrder,T introduce ambiguities? eg:

proc foo[T](a: int, b: SortOrder, c: T)
proc foo[T](a: int, b: SortOrder[T])

future work:

osproc.nim#L1230 has:

## * `posix_utils.sendSignal(pid: Pid, signal: int) <posix_utils.html#sendSignal,Pid,int>`_

which is not DRY; I wonder if we can support writing:

## * posix_utils.sendSignal_

and then show in the docs a correct link along with this link name:

proc posix_utils.sendSignal(pid: Pid, signal: int)

this probably require a 2-pass algorithm, hence "future work"

a-mr commented 3 years ago

can foo,int,SortOrder,T introduce ambiguities? e.g.

No, it cannot. Only the first example has this signature, for second one it will be foo,int,SortOrder[T].

which is not DRY; I wonder if we can support writing:

Yes the current plan (for the second part of work) is to support both variants:

posix_utils.sendSignal_
`posix_utils.sendSignal(Pid,int)`_

and then show in the docs a correct link along with this link name:

A good news is that current idx files already preserve them so it shouldn't be hard to implement.

I think you are right, it's the most logical behaviour.

a-mr commented 3 years ago

@timotheecour Current plan is to set a full link name like you wrote above:

proc sendSignal(pid: Pid, signal: int)

What if we want to provide our own name for the link like:

this Posix function

? Is there any syntax expected for that?

timotheecour commented 3 years ago

Is there any syntax expected for that?

ya this should be discussed (but the preferred/encouraged way should be the simplest See posix_utils.sendSignal_)

some options:

  # this should work but not recommended (brittle links etc)
  ## See `this <posix_utils.html#sendSignal,Pid,int>`_

  # maybe this? (general group + individual overload)
  ## See `this <posix_utils.sendSignal_>`_ posix function.
  ## See `this <posix_utils.sendSignal(Pid,int)_>`_ posix function.

  # or this? (general group + individual overload)
  ## See [this](posix_utils.sendSignal_) posix function.
  ## See [this](posix_utils.sendSignal(Pid,int)_) posix function.

a-mr commented 3 years ago

@timotheecour 2 more suggestions:

What about this DRY but limited form (without arbitrary names) with using " to delimit a part for inclusion into link text:

`"proc posix_utils.sendSignal" (Pid, int)`_ -> [proc posix_utils.sendSignal]() `proc posix_utils."sendSignal" (Pid, int)`_ -> [sendSignal]() `"proc " posix_utils."sendSignal" (Pid, int)`_ -> [proc sendSignal]() ?
What if we specify proc after the name as a request to use this name literally: `posix_utils.sendSignal proc`_ -> [posix_utils.sendSignal proc]() Format "short name + proc" is widespread in os.nim and other files in Nim repository, so it makes sense to add special syntax for it. And we can still show a tip with full name when hovering over the link.

timotheecour commented 3 years ago

What about this DRY but limited form (without arbitrary names) with using " to delimit a part for inclusion into link text:

i don't really like these; if i decide to use a custom link name, I shouldn't be limited to using the symbol name (otherwise might as well stick to the default See posix_utils.sendSignal_)

What if we specify proc after the name as a request to use this name literally:

is the problem you're trying to solve to allow a syntax to show the short-form (without the full declaration), or to allow showing a custom link name?

I think we should allow writing arbitrary custom link name, which neither 1 nor 2 do (but by default encourage the simplest See posix_utils.sendSignal_ which auto-generates link name from declaration, showing nb of overloads if there are overloads, else showing the resolved declaration if there is a single symbol)

Format "short name + proc" is widespread in os.nim and other files in Nim repository

But See posix_utils.sendSignal_ could be used in each such instance, advantageously.

So all that's needed is to agree on a syntax to allow arbitrary custom link names; i've suggested one in https://github.com/nim-lang/RFCs/issues/125#issuecomment-897081726 but feel free to suggest alternatives

a-mr commented 3 years ago

is the problem you're trying to solve to allow a syntax to show the short-form

yes, in most cases the full form is overwhelming IMHO. Consider this modified example from os.nim (splitPath):

  ## See also: BEFORE
  ## * `joinPath(head, tail) proc <#joinPath,string,string>`_
  ## * `joinPath(varargs) proc <#joinPath,varargs[string]>`_
  ## * `/ proc <#/,string,string>`_
  ## * `/../ proc <#/../,string,string>`_
  ## * `relativePath proc <#relativePath,string,string>`_
  ##
  ## See also: AFTER
  ## * `joinPath(string, string)`_
  ## * `joinPath(varargs[string])`_
  ## * `/`_
  ## * `/../`_
  ## * relativePath_

To my taste the AFTER case is unnecessarily detailed.

timotheecour commented 3 years ago

3 possibilities:

how about controlling in the UI whether the long or short form is shown (just like we have a switch in UI for dark mode vs light mode)
or, use a similar technique as we use for expanding the pragmas by clicking on it
or just simply always show the short-form for links

the main point is to allow keeping the nice, short form for all links and not have to introduce a decision making each and every time we write a link to a proc in docs

a-mr commented 3 years ago

In the updated edition of PR link text is what is actually displayed:

Referencing by just function names is also working now:

* `joinPath(head, tail)`_

Full signature is displayed as a tooltip on hovering over the link.

a-mr commented 3 years ago

There is one more design decision regarding cross-module doc links: whether .idx should be read implicitly or not.

If we input `mod1.proc1`_ link then the mod1.idx module should be loaded.

I'd prefer to introduce explicit syntax for that:

.. import:: mod1.idx

This is a new dedicated RST directive import here. It is supposed to be placed at the 1st doc comment in .nim file or somewhere in the beginning of .rst file. Then all symbols exported from mod1.nim (and recorded in mod1.idx) will be available for referencing in the current .nim or .rst file.

a-mr commented 2 years ago

Updated plan

1) No separate stage for link checking (like nim buildIndex proposed above) is required. 2) No explicit import for .idx files.

Instead:

1) doc processing will be done in 2 runs — a) with .idx generation b) with link resolution. 2) we'll have a rule that specially formatted link string like module: linkTarget (note the colon :) will load a <module>.idx file implicitly. This will work for .md/.rst files also, so e.g. it will be possible to reference section headings (and other anchors) in documentation between .md files in the simple format.

Using a recently introduced Markdown syntax [link string] (instead of RST `link string`_) the referencing will look like:

Ref. [manual: Lexical Analysis]
  -- reference section "Lexical Analysis" of manual.md from any `.nim/.md` file
Ref. [posix_utils: sendSignal(Pid, int)] 
  -- reference the procedure of module posix_utils.nim from any file

In this example manual.idx and strutils.idx will be sought for in output dir (default htmldocs or one configured manually in cmdline). During the first run, when .idx files are not yet generated, nim doc/md2html will emit warnings. To suppress them a new option --noCheckLinks will be introduced. So 2 runs will typically look like:

nim doc --noCheckLinks --index:on <file> for all files in the project
nim doc <file> for all files in the project

In practice double running should not be too bothersome because all .idx can be generated once (e.g. in Nim repository — by ./koch docs ) and then one can write docs doing referencing with link checking using existing .idx files.

Araq commented 2 years ago

Think about the consequences regarding algorithmic complexity. It used to be the case that index generation was done without .idx files entirely but it caused an O(n^2) algorithm or even some exponential explosion.

nim-lang / RFCs