Open hdgarrood opened 9 years ago
Hi @hdgarrood - yes, the offer still stands! I'm in the process of rewriting Hoogle to make it more flexible, the prototype is at http://hoogle.haskell.org/. My understanding is Purescript is that's its basically Haskell (or at least as far as Hoogle cares), so making Hoogle work with it should be very easy. The main thing I'd need from you is a way to extract the type signatures and documentation out of Purescript packages.
After that, you can either run your own server, or I can spin one up at hoogle.haskell.org/purescript.
As an added bonus (for me) I was intending to get to grips with Purescript anyway for a project I'm hoping to start working on, so a nice opportunity.
Great!
My understanding of Purescript is that it's basically Haskell
Yes, pretty much. As far as I know, the only differences that might be important are:
type Person = { age :: Number, name :: String }
String
, written !
. Effects are mainly used for tracking exactly what an Eff
action is able to do (Eff
is more or less PureScript's IO
). So an action which uses the global random number generator to get a random number might have the type forall r. Eff (random :: Random | r) Number
Do you think both of those could be handled well?
Also is this the format that I would need to extract type signatures and documentation in? Is it specified anywhere?
So both those type system extensions can certainly be handled by encoding to something else to start with, and I'm sure we can do very well with either a clever encoding (which is transparent to the user) or custom support if necessary.
That link is the format that Hoogle currently takes for Haskell. It is basically documentation in the style supported by Haddock, followed by single line Haskell-ish declarations. You certainly could translate PureScript exactly to that (translating the extra features away), but another approach would be to have your own custom format of documentation/declarations - as long as it's sensible it shouldn't be too hard to consume (the current one has several features that make it a pain to consume).
Ok then - would something JSON-y be considered sensible?
Also, could a clever encoding involve putting extra characters in names that aren't allowed to appear in normal code, like { name :: String, age :: Number }
becoming $Row ($name String) ($age Number)
or something?
On an unrelated note, is there a (relatively) easy issue in Hoogle that I could tackle? I'd like to become a little more familiar with this code if possible, hopefully that might make things easier.
Given completely free choice, I'd probably go for something plain-text based - since JSON can only encode a bit of the structure at most (your code is code, your documentation probably has markup). But if for some reason JSON is easier to generate or something else it doesn't matter too much.
Yes, you certainly could encode Row that way - there's plenty of ways to do it, Template Haskell being another one - the syntax tree I use is quite rich.
Currently the bugs in the bug trackers (both GitHub and Google bug tracker) mostly refer to v4, and the code on master is v5, which in most cases is incomplete rather than buggy. Some possible ones to get started (some of which may be a case of seeing if they still occur or have disappeared) include:
I just stumbled across an excellent beginner bug, see https://github.com/ndmitchell/hoogle/issues/103 .
Ah, great! I'll take a look.
I'm getting quite close to being able to produce some kind of file like the Haskell one for cmdargs that I linked above. I was just wondering if you could expand a little bit on the features that make the Haskell format a pain to consume? That way we can hopefully avoid having the same features in the PureScript format.
The basics of the Hoogle output aren't too bad, and it isn't too painful to consume. The things to watch out for are:
How does this look? https://gist.github.com/hdgarrood/af6aea24f19d0365bbed
More detailed package metadata such as what you suggest is possible, but depends on the authors including that information in the bower.json file. There will always be a package name and version, though.
That example has type / data constructor / type class names fully qualified if they come from a module other than the current module. I could also make names fully qualified everywhere, without very much effort, if that makes it easier on the Hoogle side?
Oh also, what does Hoogle do about type class instances? Should they be included?
Looks good to me. Either always qualified or sometimes qualified both work just fine for me. Please include the instances, without any documentation, as Hoogle does use them for refining type search. (If they aren't there, it isn't the end of the world, but the search quality will be reduced.)
Ok, great - here's the latest example: https://gist.github.com/hdgarrood/0c1c13319ca3fd16d4fc
I didn't manage to ensure that names are always qualified, but fully qualified names are there in the majority of cases. Also instances, data constructors, and type class members are now included.
So is the next step to create a module like https://github.com/ndmitchell/hoogle/blob/master/src/Input/Hoogle.hs but for PureScript input files?
Yep, that's the next step. Given the closeness to Haskell, I'd be tempted to use the haskell-src-exts parser than the existing Hoogle.hs uses, and do a bit of light text-munging to get round the fact that instances aren't named.
Ok then - there's also:
{ foo :: Int, bar :: String }
is equivalent to Prim.Object ( foo :: Int, bar :: String )
.I think these are probably manageable too.
You previously mentioned using Template Haskell to encode rows - would you mind expanding on that a little bit? Can you see an advantage of TH over having a set of 'sentinel' types to mark that a type is a row, perhaps with something like $Row ($name String) ($age Int)
?
Also, do we need to be able to take a type as represented by the Hoogle syntax tree, and turn it back into a string with purescript syntax recognisable to humans?
@paf31 also suggested a couple of other options:
( name :: String, age :: Int )
could be Cons "name" String (Cons "age" Int Nil)
, although perhaps this would present an issue with the ordering?(Has "name" String r, Has "age" Int r) => r
, which might not suffer from the ordering problem?I'd go for the simplest translation that works - so your sentinel types seem reasonable. We can play with alternative encodings after you've got something working.
Also, do we need to be able to take a type as represented by the Hoogle syntax tree, and turn it back into a string with purescript syntax recognisable to humans?
Currently, yes, but I'm intending to change that, so don't worry for now - accept the ugly version and expect it to become nice in the future.
Yes, the type class approach is probably not a good idea, since you quickly run into rank-2 types when a record is on the LHS of a function arrow.
I like the HList-style encoding, but it would be really great if it were possible to reorder the labels in a row. Would it be possible to support a zero or low cost rewrite for reordering labels (later)?
I wouldn't overly worry about records at this stage. Best to get text/name searching going, and then once that's working nicely, then focus on what type encoding is used.
Great - here's another example: https://gist.github.com/hdgarrood/cbab6fac87bd5bcc300c
I decided to modify the code that produces that file so that text-munging inside Hoogle isn't necessary; performing the kinds of transformations we were talking about on rows with text-munging seemed a bit too scary.
Awesome! Given that, what do you think we should do next? And what's the end goal? Are you hoping to eventually run your own Hoogle instance for Purescript? Or do you want the normal Hoogle instance to also serve Purescript searches, probably at a slightly different URL?
One way forward is for you to produce a tarball containing all purescript modules, and I'll work on parsing that, and give you a command line flag you can run to generate a Hoogle instance, so you can spawn up a server. It would probably take me a week to find the time to do the work required. Once I've got the prototype, it would be much easier for you to hack from there.
Re next steps, could we try putting the above example in to Hoogle and see what happens? I'm currently not quite sure exactly how to go about doing that with Hoogle 5.
The end goal is to allow people to perform Hoogle searches from within the Pursuit web application; possibly using JS on the client side, retrieving results via JSON?
If it's possible for the normal Hoogle instance to serve PureScript searches, that would be really nice for us, as it would make deployment easier.
The tarball approach sounds good. Will it be necessary to select a set of versions which all build together, or is it ok to just take the latest known version of every package?
How would the tarball approach work when we upload a new package to Pursuit? My understanding was that we would attempt to merge the new data into the database immediately, which is why I assumed we would host Hoogle inside Pursuit itself, or run it on the same server in a separate process. I'm basing this on the Hoogle 4 source on Hackage, however, so I don't know if it's possible with the Hoogle 5 architecture.
@hdgarrood Just the latest version of all packages in a tarball works best - no requirement for them to build together. We can certainly start with a purescript entry point and serving stuff over JSON. Eventually you might want to take control over your update schedule and get the reliability of one less server involved, but no rush. There is currently no JSON end point or ability to switch databases in Hoogle, but both can be added without too much hassle.
@paf31 With Hoogle 5 there is no incremental building of packages - the idea is that everything is super fast (< 1 min for all of Stackage - 1000+ packages with 100K+ entries) that you just rebuild everything. If you want to do that every time a package gets uploaded, you certainly could. That will require you to control your updates though - Hoogle on the server only regenerates its index every day currently.
I'll try and do something on the train tomorrow. If you have a tarball of docs ready by 6am UK time I'll pick that up. Otherwise I'll tar the one example and create my own - so don't feel you have to rush, but if it's sitting on your machine, uploading it would be useful.
On second thoughts, don't worry about the tarball thing just yet - given that we're actually quite close to being able to deploy a beta version of Pursuit (and when we do, we'd like to have type search), and given also that Pursuit is now producing Haskell-compatible Hoogle files, then it seems sensible to attempt this with Hoogle 4, at least for now.
OK, that seems reasonable. I'm hoping to attend http://www.meetup.com/London-Haskell/events/223598997/ so maybe see you then.
Cool, looking forward to it!
I've integrated Hoogle 4 as a library into the Pursuit server, see http://new-pursuit.purescript.org/search?q=a+-%3E+a+-%3E+a and everything seems to be working very well! Thanks very much for your help and of course for creating such a useful piece of software. :) We're probably going to stick with Hoogle 4 for now. Once Hoogle 5 has type search and a Haskell API we will probably look at updating.
There are a couple of areas where the Hoogle integration is a little awkward, and I have a few suggestions for how the Hoogle library API could change to make it a little smoother - if any of these sound good to you, I'd happily work on them:
Language
is an enumeration inside the Hoogle library, which means that if a language is to be supported, the Hoogle library itself needs to change, and the query parser etc all need to be added to Hoogle itself. This is unfortunate, because I'd like to eg. depend on the purescript
library to parse queries. Do you think it would be possible to have something like this instead?data Language = Language
{ languageParseQuery :: String -> Either ParseError Query
, languageRenderQuery :: Query -> TagStr
...
}
Then, I could write a package hoogle-purescript
which depended on both hoogle
and purescript
, and provided a Language
value which could be used with Hoogle.
Of course, you would then have to export constructors for the Query
type, or do something similar, which I can understand if you didn't want to do.
createDatabase
and then loadDatabase
straight afterwards?ItemEx
values directly in Haskell code? Perhaps by exporting those types and constructors too? Then I can use Haskell's type system to help check whether I've created a valid Hoogle input file. Hoogle could then provide some way of serializing and deserializing these, or even allow users to put ItemEx
values straight in to databases without having to serialize them?locations
bit), and also includes Hackage URLs and seems to produce results according to Hackage's URL structure. Would it be possible to make the result data type independent of any URLs or URL structure, so users would be able to construct URLs themselves? For example, would something like this work?data Result = Result
{ resultPackage :: String
, resultTagStr :: TagStr
, resultInfo :: ResultInfo
}
data ResultInfo
= PackageResult
| ModuleResult String -- ^ Module name
| DeclarationResult String String -- ^ Module name & declaration title
I just had a quick look at the Hoogle
module in Hoogle 5 and it seems at least some of these things are already happening, but I thought it would be useful for me to write this up from the point of view of someone coming from Hoogle 4 anyway.
Thanks for the notes. Yep, some of those are covered, but great to get the list. I'm not sure if purescript would be fine as a dependency or not, it doesn't seem that big relative to the web stuff I depend on.
PureScript is a statically typed functional programming language, very similar to Haskell, mainly targeting JavaScript.
I saw your "Hoogle for your language" blog post from http://neilmitchell.blogspot.co.uk/2011/03/hoogle-for-your-language-ie-f-scala-ml.html recently. Does the offer still stand? I'd like to create some kind of tool to allow type searching across PureScript libraries. I'm happy to do the majority of the legwork. In fact I'm kind of hoping I can extract a Google Summer of Code out of it...!