Closed f-f closed 5 years ago
Going through externs is probably a lot of extra work. Easier to just scrape the docs output.
If you use "markdown" output, it doesn't give you links:
Markdown:
#### `toUnfoldable`
``` purescript
toUnfoldable :: forall f. Unfoldable f => List ~> f
Convert a list into any unfoldable structure.
Running time: O(n)
HTML: (note the anchor links)
````html
<div class="decl" id="v:toUnfoldable">
<h3 class="decl__title clearfix">
<a class="decl__anchor" href="#v:toUnfoldable">#</a
><span>toUnfoldable</span>
</h3>
<div class="decl__body">
<pre class="decl__signature">
<code>
<a href="Data.List.html#v:toUnfoldable" title="Data.List.toUnfoldable">
<span class="ident">toUnfoldable</span>
</a>
<span class="syntax">::</span>
<span class="keyword">forall</span> f<span class="syntax">.</span>
<a href="Data.Unfoldable.html#t:Unfoldable" title="Data.Unfoldable.Unfoldable">
<span class="ctor">Unfoldable</span>
</a>
f <span class="syntax">=></span>
<a href="Data.List.Types.html#t:List" title="Data.List.Types.List">
<span class="ctor">List</span>
</a>
<a href="Data.NaturalTransformation.html#t:type (~>)" title="Data.NaturalTransformation.type (~>)">
<span class="ident">~></span>
</a>
f
</code>
</pre>
<p>Convert a list into any unfoldable structure.</p>
<p>Running time: <code>O(n)</code></p>
</div>
</div>
I'd just like some suggestions on what all people use to search and browse HTML files though, with directory searching instead of just in-page searching
This is the implementation of Pursuit's search: it imports the PureScript compiler to parse the source of the packages and weigh the tokens to rank them in the results. It might be useful to use this information (so it's easier to implement the search), so to avoid reimplementing that stuff it could be useful if this information could be output by purs docs
with some flag.
Pursuit doesn’t do any parsing; the information is all available via the JSON which has been generated by purs publish
and uploaded to pursuit.
By the way I’m interested in supporting use cases like this in the compiler: see also https://github.com/purescript/purescript/pull/3528#issuecomment-460052868
Thanks @hdgarrood, that would be indeed wonderful! 👏
It looks like with a json export for Module
we'd have all the data Pursuit has from purs publish
, so it would be possible to create the same SearchIndex
as in there.
The only thing missing at this point would be a frontend way to parse the input query. For the MVP (Minimum Viable Pursuit, clearly) I wanted to try shipping purescript-cst
through GHCJS. But since that's going to be integrated in the compiler (doing weird stuff for MVP is ok, but I don't want to ship the whole compiler over GHCJS), I guess the next best approach would be to write a small parser for signatures in PureScript. I wonder if someone has tried a PureScript parser in PureScript?
They have indeed :) https://github.com/purescript/purescript-in-purescript there was an effort to make the compiler self-hosting a while ago, but it was put on hold because of performance issues.
Cool, thanks for the pointer. I had memory of this, just didn't look in the right place :)
It looks like it would be possible to port some of the parsing code (though the project is quite outdated)
@hdgarrood I think the compiler's feature that would allow this is now being tracked in https://github.com/purescript/purescript/issues/3503 right?
I guess so, since there isn't a dedicated issue for it. We probably should have a dedicated issue for it, though.
Actually no, sorry, I don't think a dedicated issue makes sense, because the design I have in mind currently for using externs files while generating docs is very closely tied to this hypothetical new --codegen docs
option.
Wonderful, thanks!
Now that #127 went in, let's repurpose this issue to focus on the search functionality, since it's the only thing missing for this
@hdgarrood I was wondering if I could help in any way with purescript/purescript#3503?
That would be great! I think the first step should be to implement a docs
target for the —codegen
option as described in https://github.com/purescript/purescript/issues/3503#issuecomment-460053455. Having not yet looked into that in much detail I’m unable to give any more useful hints than that but if you get stuck or confused by anything please feel free to ask; the best way is probably via commenting on that issue.
Update: 0.13
is out, and it contains the upstream fix needed for this to work (thanks @hdgarrood and props on the release! 🎉), so here's a small recap on what we're missing for this issue.
The goal is to make the docs produced by spago docs
searchable. A way we could do it would be:
purs docs
, or generate entirely new html from the markdown output (which should now contain the same info)purs compile --codegen docs
. We can probably port to PureScript some of the Pursuit code for thisI checked the code, and from what I saw there I conclude that the most easy way is to add purescript as a dependency and reuse the Module
type, the asModule
parser, and also to start depending on pursuit to reuse its SearchIndex
. Another option is to copypaste the required definitions, obviously.
Regarding the interface, I think that CLI is a must have.
The obvious problem will then be speed - when I was tackling with pursuit, its index rebuild was taking quite a lot of time. I think that building the index once and dumping the Trie
to a file somehow may be faster than reparsing docs.json
files on each search invocation.
Regarding search function in the web interface, converting a Trie to JS will probably result in having a too large bundle. It may be possible to use files on the system as Trie nodes, so that new parts of index will be loading into the browser on demand as the user types.
Apparently, pursuit is depending on an old version of purescript, so maybe it isn't a good idea to copy its approach. The parser is completely new, but pursuit is still using old version to parse types.
@klntsky thanks for looking into this!
I checked the code, and from what I saw there I conclude that the most easy way is to add purescript as a dependency and reuse the
Module
type, theasModule
parser, and also to start depending on pursuit to reuse itsSearchIndex
. Another option is to copypaste the required definitions, obviously.
As far as I understand you'd like to create the index on the Haskell side. The problem with that is that we'd need to duplicate types (SearchIndex
, etc) on the PureScript side.
I don't have a strong opinion on this, but I'd not like to depend on the compiler and/or Pursuit anyways, so we could copy paste code or just port it to PureScript, so we have one codebase to handle all the search-related stuff.
Assuming we'd go for the all-PureScript route, this is what I think it should happen when running spago docs
:
spago
calls purs compile --codegen docs
to get all the docs.json
filesspago
calls purs docs
to generate the docs in generated-docs
spago
clones this new PureScript project that handles search (yes I think this code should go to another repo), and in this repo:
bundle-app
with an entry point to the part of the code that builds a search index out of the docs.json
files, and runs that with Node. This will dump the search index in a JSON file or something (see below for more details)bundle-module
with entry point to the part of the code that is going to run on the page to parse the searchbox and lookup in the index (this runs in the browser). This app will read in the search index and lookup into it with the result of the parse in the search box (note: this should probably be done without depending on any JS libraries, so we can skip a call to a JS bundler)purs docs
Regarding the interface, I think that CLI is a must have.
I think it's very nice to have, but I'd say the biggest priority here is to replicate the search behavior on Pursuit (in fact that's also the title of this issue)
The obvious problem will then be speed - when I was tackling with pursuit, its index rebuild was taking quite a lot of time. I think that building the index once and dumping the
Trie
to a file somehow may be faster than reparsingdocs.json
files on each search invocation.
As I described above, parsing the docs.json
and generating the index is a one-off task, and I think it's not a problem if it's slow, since running purs docs
takes a while anyways. So yes, I think we should dump the search index on a file
Regarding search function in the web interface, converting a Trie to JS will probably result in having a too large bundle. It may be possible to use files on the system as Trie nodes, so that new parts of index will be loading into the browser on demand as the user types.
I'd say we could try going for a single bing bundle in the beginning, and then benchmark how bad it is, and then eventually split it in parts to be loaded on demand?
Do we want to generate index for all packages in a set, or for listed dependencies only? If first, maybe it is better to generate it once and distribute with package set somehow? AFAIK stack+hoogle use similar approach. UPD: not true actually
@klntsky since we'd generate the SearchIndex
from the output
of the compiler, that would include only the listed dependencies.
Even if we'd want to generate the index for all packages in the set we still could not cache that since users are allowed to have local packages, so distributing a precache wouldn't work as the index would have to be regenerated locally anyways.
Btw I think stack
distributes a package version index to speedup downloads (and we do that too), I don't know anything about them distributing a code search index, do you have any pointers about that?
I don't know anything about them distributing a code search index, do you have any pointers about that?
Nevermind, I checked it now and realized that I was wrong.
Also, looks like the only available purescript implementation of tries I could find is not suitable for storing search index.
I'm working on it here: https://github.com/klntsky/spago-search
As for now it looks like this:
It is clear that some UI design decisions should be made.
That "PureScript API documentation" title is too long: the search field does not fit. Maybe we can replace it with something shorter?
Also, since the number of results changes as the user types, the main contents of the page jump up and down. There are some ways to solve this:
@klntsky great work so far! 👏
That "PureScript API documentation" title is too long: the search field does not fit. Maybe we can replace it with something shorter?
How about "$ProjectName Docs"?
In any case the searchbar going on a newline is perfectly fine, Pursuit does it too on mobile:
Also, since the number of results changes as the user types, the main contents of the page jump up and down
I pictured having the results in the same place as they are right now in Pursuit (i.e. let's literally replace the contents of the page with them):
I'm not sure how they'd jump since you're just typing in the header so that should not move?
How about "$ProjectName Docs"?
I figured out it is possible to shrink the width of the container of that "Index" link, at the same time making the search field narrower. In practice, most search queries will fit into the field. And it looks way better than with a line break.
let's literally replace the contents of the page with them
OK. I think we should also restore the main contents when the search field is empty.
Another idea unrelated to UI is that we can populate the trie by camel-cased parts of names, i.e.
for querySelectorAll
we may store the same index entry by three paths: queryselectorall
, selector
and all
(the latter two should have lower priority).
The sad thing is that we seemingly can't split the index to load it on demand and have type search ability at the same time, because type searching requires comparing with all available types (if we go the pursuit way).
Splitting the index is absolutely required: as for now it is 6.4 MB (for modules used by the project itself), and loading it freezes the page for ~8s on my machine.
@klntsky I think we can detect when type search is being used right? Then we could split the index and load all the parts when we get a type query. If it's split in parts then it should also be possible to keep the page responsive (and show a loading spinner) while we load the parts
Then we could split the index and load all the parts when we get a type query.
This is very straightforward, but I'd say I dislike this approach. We can't improve the total time to load the index by loading it in parts: the only thing that we can do is suppressing the "tab is not responding" message.
Anyway, last night I brainstormed the problem and found a solution.
We can split all types in the index by their shapes.
A "type shape" is a lossy encoding of types:
type TypeShape = List ShapeChunk
data ShapeChunk
= PVar
| PFun
| PApp
| PForAll Int
| PRow Int
For example, we encode forall a. (forall h. ST h (STArray h a)) -> Array a
as
[ PForAll 1, PFun, PForAll 1, PApp, PApp, PVar, PVar, PApp, PApp, PVar, PVar, PVar, PApp, PVar, PVar ]
(using Polish notation).
"Type shapes" preserve the crucial info about the type, but don't store the unnecessary details. They can be hashed and used as keys in a Map, which can be loaded on demand.
Unfortunately, by using them we are giving up on the ability to find subtypes (e.g. a -> (b -> b)
type will not be matched with a -> b
query). But I believe it is acceptable, since most of these subtypes are usually irrelevant.
I implemented the search, and IMO the quality level of results is OK (as for now they are only matched by shape, and not sorted by distance, like done in pursuit, but I'll implement that sorting too).
generated-docs/index/index.json
would be of great help:{ packages :: Array { name :: String
, modules :: Array String
, dependencies :: Array String
, repository :: String
, version :: String -- AFAIK spago only deals with git revisions, right?
}
, modules :: Array String -- list of user-defined modules from src/ and test/
}
The repository
field would allow to make package names clickable.
This info is also needed to map modules to package names: this is a temporary solution I'd like replace with something less hacky.
Do we really want to call bundle-app
and run
on the client side? IMO distributing a precompiled version of both the app and the index builder is better for the UX.
Besides, building on the client will introduce more maintenance burden.
Directory structure will look like this:
generated-docs/
index/
index.json
app.js
declarations/
1.js
2.js
...
n.js
types/
declarations
and types
will contain parts of the corresponding search indices.
File names in generated-docs/index/types
will be in a format defined by this function. Each of these files will contain all types of the corresponding "type shape".
@klntsky
purs
version used to compile.
So let's instead build it as part of the build here, and ship all the artifacts as templates in the binary (so the build works offline too).
The problem on this side of the tradeoff is that bootstrapping spago
's build will become slightly more difficult - as one will either need a previous version of spago to build the artifacts, or just do a build with empty files first, and then use that build to produce the correct ones, and build againspago-search
README or in the "internals" documentspago-search
project under the spacchetti
org? You'd retain admin etc etc, it's just to make it more "official" and keep things tidy.
Actually given the build setup from (2) I'd also be fine merging all the code in this repo (I originally proposed having a separate repo because of the build setup)Prim
modules too - since we don't have docs.json
for them - but we can worry about them at a later stage, let's get this merged firstGreat idea! Should we generate that file from Haskell or PureScript?
From PS. Initially I thought that this should be done on the spago side, because I completely forgot that there are bower.json
files under the .spago
dir.
Sounds good. We should probably collect all these docs in the spago-search README or in the "internals" document
OK, I'll have a "documentation day" after I'm done.
Would you like to move the spago-search project under the spacchetti org?
Yes, definitely.
Actually given the build setup from (2) I'd also be fine merging all the code in this repo (I originally proposed having a separate repo because of the build setup)
I think that the artifacts (a nodejs script and a webapp) can be just downloaded at runtime from github releases. In this case they should also be pinned by hash. This will make offline builds possible while not making spago self-dependent.
I just realized we should somehow include Prim modules too - since we don't have docs.json for them - but we can worry about them at a later stage, let's get this merged first
Yeah, pursuit has the luxury of extracting the Prim definitions from the compiler. We may write a simple app that generates these jsons later.
Also, I think that we may want to put the index
dir to generated-docs/html
, so that setting up a local "pursuit" with a static-only web server will be as simple as making generated-docs/html
a webroot. If we have generated-docs/html
and generated-docs/index
, the webroot will have to be generated-docs
, and all doc URLs will begin with html/
, which will be slightly less elegant.
@klntsky sounds great! I invited you to the spacchetti
org, so after you accept the invite you should be able to move the repo
Maybe it's time to choose a better name for that search thing, @f-f?
I could make it work without spago (by adding a CLI interface to provide paths where to search for data), so putting spago in the title may be confusing. What do you think of purescript-docs-search
as a name for the repo?
What do you think of
purescript-docs-search
as a name for the repo?
@klntsky sounds great!
OK. One more question then. If you agree to distribute the precompiled app, then it is no more a requirement to avoid JS dependencies, right?
I was thinking that if the directories to search the data in could be passed through the CLI, then introducing glob
-syntax for these arguments is a natural thing to do.
More specifically, a call to the index builder could look like this:
./index-builder.js --docs-files './output/**/docs.json' --generated-docs-files './generated-docs/html/*.html' --bower-flies './.spago/**/bower.json'
This requires adding glob
as a dependency.
The docs.json files don’t include re-exports currently, by the way (this is as a result of various implementation details). Perhaps I should have made this clearer before, but the intended way of consuming them is to use the collectDocs
function from the compiler library.
The docs.json files don’t include re-exports currently
I know, this is absolutely fine for our purposes (it's even better this way, cause no need to deal with duplicate index entries).
the intended way of consuming them is to use the collectDocs function from the compiler library.
How stable is the format of these files? Should I expect my ad hoc docs.json
decoder to break eventually?
Backwards-incompatible changes are pretty rare (they're a pain for us because they necessitate regenerating all of Pursuit's docs) but forwards-incompatible changes are not that uncommon. These files also aren't part of the compiler's public API, so e.g. they could potentially change in a backwards-incompatible way without a major compiler release.
I think it's OK to depend on the format of the docs.json
if they are mostly backwards-compatible: at generation time we know the version of the compiler we're dealing with, so we can just have a big switch on the version in the app
If you agree to distribute the precompiled app, then it is no more a requirement to avoid JS dependencies, right?
@klntsky yep, I'm fine with adding JS deps
We should be able to generate documentation for all the libraries in the package set (and/or just the libraries used in the project), in a similar way to what
stack haddock
does for Haskell projects.The use-cases for this are multiple and useful:
Note that this would obsolete #27, as it would entirely remove the "where do I upload my library" problem → you would just add it to a Spacchetti package-set, and have it "published" in the next Spacchetti release
How to achieve this:
purs docs
command.Justin had a good start with acme-spago. He included a spago project in there, but if this is integrated in spago itself, we wouldn't need it (as we would just generate that info on the fly)