PureScript search? - Githubissues

hdgarrood commented 9 years ago

PureScript is a statically typed functional programming language, very similar to Haskell, mainly targeting JavaScript.

I saw your "Hoogle for your language" blog post from http://neilmitchell.blogspot.co.uk/2011/03/hoogle-for-your-language-ie-f-scala-ml.html recently. Does the offer still stand? I'd like to create some kind of tool to allow type searching across PureScript libraries. I'm happy to do the majority of the legwork. In fact I'm kind of hoping I can extract a Google Summer of Code out of it...!

ndmitchell commented 9 years ago

Hi @hdgarrood - yes, the offer still stands! I'm in the process of rewriting Hoogle to make it more flexible, the prototype is at http://hoogle.haskell.org/. My understanding is Purescript is that's its basically Haskell (or at least as far as Hoogle cares), so making Hoogle work with it should be very easy. The main thing I'd need from you is a way to extract the type signatures and documentation out of Purescript packages.

After that, you can either run your own server, or I can spin one up at hoogle.haskell.org/purescript.

As an added bonus (for me) I was intending to get to grips with Purescript anyway for a project I'm hoping to start working on, so a nice opportunity.

hdgarrood commented 9 years ago

Great!

My understanding of Purescript is that it's basically Haskell

Yes, pretty much. As far as I know, the only differences that might be important are:

PureScript supports rows in its type system, for example type Person = { age :: Number, name :: String }
It also has effects, which have a separate kind from 'normal' types like String, written !. Effects are mainly used for tracking exactly what an Eff action is able to do (Eff is more or less PureScript's IO). So an action which uses the global random number generator to get a random number might have the type forall r. Eff (random :: Random | r) Number

Do you think both of those could be handled well?

Also is this the format that I would need to extract type signatures and documentation in? Is it specified anywhere?

ndmitchell commented 9 years ago

So both those type system extensions can certainly be handled by encoding to something else to start with, and I'm sure we can do very well with either a clever encoding (which is transparent to the user) or custom support if necessary.

That link is the format that Hoogle currently takes for Haskell. It is basically documentation in the style supported by Haddock, followed by single line Haskell-ish declarations. You certainly could translate PureScript exactly to that (translating the extra features away), but another approach would be to have your own custom format of documentation/declarations - as long as it's sensible it shouldn't be too hard to consume (the current one has several features that make it a pain to consume).

hdgarrood commented 9 years ago

Ok then - would something JSON-y be considered sensible?

Also, could a clever encoding involve putting extra characters in names that aren't allowed to appear in normal code, like { name :: String, age :: Number } becoming $Row ($name String) ($age Number) or something?

On an unrelated note, is there a (relatively) easy issue in Hoogle that I could tackle? I'd like to become a little more familiar with this code if possible, hopefully that might make things easier.

ndmitchell commented 9 years ago

Given completely free choice, I'd probably go for something plain-text based - since JSON can only encode a bit of the structure at most (your code is code, your documentation probably has markup). But if for some reason JSON is easier to generate or something else it doesn't matter too much.

Yes, you certainly could encode Row that way - there's plenty of ways to do it, Template Haskell being another one - the syntax tree I use is quite rich.

Currently the bugs in the bug trackers (both GitHub and Google bug tracker) mostly refer to v4, and the code on master is v5, which in most cases is incomplete rather than buggy. Some possible ones to get started (some of which may be a case of seeing if they still occur or have disappeared) include:

https://code.google.com/p/ndmitchell/issues/detail?id=76 - console output should probably be coloured using the ansi-terminal package
https://code.google.com/p/ndmitchell/issues/detail?id=90 - the docs in the web page have dotted lines under them before expansion
https://github.com/ndmitchell/hoogle/issues/30 - searching for Unicode stuff doesn't display well.

ndmitchell commented 9 years ago

I just stumbled across an excellent beginner bug, see https://github.com/ndmitchell/hoogle/issues/103 .

hdgarrood commented 9 years ago

Ah, great! I'll take a look.

hdgarrood commented 9 years ago

I'm getting quite close to being able to produce some kind of file like the Haskell one for cmdargs that I linked above. I was just wondering if you could expand a little bit on the features that make the Haskell format a pain to consume? That way we can hopefully avoid having the same features in the PureScript format.

ndmitchell commented 9 years ago

The basics of the Hoogle output aren't too bad, and it isn't too painful to consume. The things to watch out for are:

You should use PureScript declarations, not try converting them to Haskell, so they're useful to other projects too.
Hoogle augments the Hoogle files with metadata from the Cabal file about author, maintainer, categories and license. You may want to include that directly in the data file.
The Hoogle output is full of bugs, which makes it a lot harder.
Currently names aren't qualified. That might be something you consider. I'm on the fence on this issue, but of course its always easier to throw information away than create information that doesn't exist.

hdgarrood commented 9 years ago

How does this look? https://gist.github.com/hdgarrood/af6aea24f19d0365bbed

More detailed package metadata such as what you suggest is possible, but depends on the authors including that information in the bower.json file. There will always be a package name and version, though.

That example has type / data constructor / type class names fully qualified if they come from a module other than the current module. I could also make names fully qualified everywhere, without very much effort, if that makes it easier on the Hoogle side?

hdgarrood commented 9 years ago

Oh also, what does Hoogle do about type class instances? Should they be included?

ndmitchell commented 9 years ago

Looks good to me. Either always qualified or sometimes qualified both work just fine for me. Please include the instances, without any documentation, as Hoogle does use them for refining type search. (If they aren't there, it isn't the end of the world, but the search quality will be reduced.)

hdgarrood commented 9 years ago

Ok, great - here's the latest example: https://gist.github.com/hdgarrood/0c1c13319ca3fd16d4fc

I didn't manage to ensure that names are always qualified, but fully qualified names are there in the majority of cases. Also instances, data constructors, and type class members are now included.

So is the next step to create a module like https://github.com/ndmitchell/hoogle/blob/master/src/Input/Hoogle.hs but for PureScript input files?

ndmitchell commented 9 years ago

Yep, that's the next step. Given the closeness to Haskell, I'd be tempted to use the haskell-src-exts parser than the existing Hoogle.hs uses, and do a bit of light text-munging to get round the fact that instances aren't named.

hdgarrood commented 9 years ago

Ok then - there's also:

Rows, which can be 'open' or 'closed'. Open rows can be extended with some other row type (which in practice is usually a universally quantified type variable), eg https://github.com/purescript/purescript-quickcheck/blob/master/docs/Test/QuickCheck.md#qc. Closed rows can not be extended, eg https://github.com/purescript/purescript-foldable-traversable/blob/master/docs/Data.Traversable.md#accum.
Record syntax sugar, which means that { foo :: Int, bar :: String } is equivalent to Prim.Object ( foo :: Int, bar :: String ).

I think these are probably manageable too.

You previously mentioned using Template Haskell to encode rows - would you mind expanding on that a little bit? Can you see an advantage of TH over having a set of 'sentinel' types to mark that a type is a row, perhaps with something like $Row ($name String) ($age Int)?

Also, do we need to be able to take a type as represented by the Hoogle syntax tree, and turn it back into a string with purescript syntax recognisable to humans?

hdgarrood commented 9 years ago

@paf31 also suggested a couple of other options:

an HList-esque style: ( name :: String, age :: Int ) could be Cons "name" String (Cons "age" Int Nil), although perhaps this would present an issue with the ordering?
With typeclasses: the same row might be (Has "name" String r, Has "age" Int r) => r, which might not suffer from the ordering problem?

ndmitchell commented 9 years ago

I'd go for the simplest translation that works - so your sentinel types seem reasonable. We can play with alternative encodings after you've got something working.

Also, do we need to be able to take a type as represented by the Hoogle syntax tree, and turn it back into a string with purescript syntax recognisable to humans?

Currently, yes, but I'm intending to change that, so don't worry for now - accept the ugly version and expect it to become nice in the future.

paf31 commented 9 years ago

Yes, the type class approach is probably not a good idea, since you quickly run into rank-2 types when a record is on the LHS of a function arrow.

I like the HList-style encoding, but it would be really great if it were possible to reorder the labels in a row. Would it be possible to support a zero or low cost rewrite for reordering labels (later)?

ndmitchell commented 9 years ago

I wouldn't overly worry about records at this stage. Best to get text/name searching going, and then once that's working nicely, then focus on what type encoding is used.

hdgarrood commented 9 years ago

Great - here's another example: https://gist.github.com/hdgarrood/cbab6fac87bd5bcc300c

I decided to modify the code that produces that file so that text-munging inside Hoogle isn't necessary; performing the kinds of transformations we were talking about on rows with text-munging seemed a bit too scary.

ndmitchell commented 9 years ago

Awesome! Given that, what do you think we should do next? And what's the end goal? Are you hoping to eventually run your own Hoogle instance for Purescript? Or do you want the normal Hoogle instance to also serve Purescript searches, probably at a slightly different URL?

One way forward is for you to produce a tarball containing all purescript modules, and I'll work on parsing that, and give you a command line flag you can run to generate a Hoogle instance, so you can spawn up a server. It would probably take me a week to find the time to do the work required. Once I've got the prototype, it would be much easier for you to hack from there.

hdgarrood commented 9 years ago

Re next steps, could we try putting the above example in to Hoogle and see what happens? I'm currently not quite sure exactly how to go about doing that with Hoogle 5.

The end goal is to allow people to perform Hoogle searches from within the Pursuit web application; possibly using JS on the client side, retrieving results via JSON?

If it's possible for the normal Hoogle instance to serve PureScript searches, that would be really nice for us, as it would make deployment easier.

The tarball approach sounds good. Will it be necessary to select a set of versions which all build together, or is it ok to just take the latest known version of every package?

paf31 commented 9 years ago

How would the tarball approach work when we upload a new package to Pursuit? My understanding was that we would attempt to merge the new data into the database immediately, which is why I assumed we would host Hoogle inside Pursuit itself, or run it on the same server in a separate process. I'm basing this on the Hoogle 4 source on Hackage, however, so I don't know if it's possible with the Hoogle 5 architecture.

ndmitchell commented 9 years ago

@hdgarrood Just the latest version of all packages in a tarball works best - no requirement for them to build together. We can certainly start with a purescript entry point and serving stuff over JSON. Eventually you might want to take control over your update schedule and get the reliability of one less server involved, but no rush. There is currently no JSON end point or ability to switch databases in Hoogle, but both can be added without too much hassle.

@paf31 With Hoogle 5 there is no incremental building of packages - the idea is that everything is super fast (< 1 min for all of Stackage - 1000+ packages with 100K+ entries) that you just rebuild everything. If you want to do that every time a package gets uploaded, you certainly could. That will require you to control your updates though - Hoogle on the server only regenerates its index every day currently.

I'll try and do something on the train tomorrow. If you have a tarball of docs ready by 6am UK time I'll pick that up. Otherwise I'll tar the one example and create my own - so don't feel you have to rush, but if it's sitting on your machine, uploading it would be useful.

hdgarrood commented 9 years ago

On second thoughts, don't worry about the tarball thing just yet - given that we're actually quite close to being able to deploy a beta version of Pursuit (and when we do, we'd like to have type search), and given also that Pursuit is now producing Haskell-compatible Hoogle files, then it seems sensible to attempt this with Hoogle 4, at least for now.

ndmitchell commented 9 years ago

OK, that seems reasonable. I'm hoping to attend http://www.meetup.com/London-Haskell/events/223598997/ so maybe see you then.

hdgarrood commented 9 years ago

Cool, looking forward to it!

I've integrated Hoogle 4 as a library into the Pursuit server, see http://new-pursuit.purescript.org/search?q=a+-%3E+a+-%3E+a and everything seems to be working very well! Thanks very much for your help and of course for creating such a useful piece of software. :) We're probably going to stick with Hoogle 4 for now. Once Hoogle 5 has type search and a Haskell API we will probably look at updating.

There are a couple of areas where the Hoogle integration is a little awkward, and I have a few suggestions for how the Hoogle library API could change to make it a little smoother - if any of these sound good to you, I'd happily work on them:

At the moment, Language is an enumeration inside the Hoogle library, which means that if a language is to be supported, the Hoogle library itself needs to change, and the query parser etc all need to be added to Hoogle itself. This is unfortunate, because I'd like to eg. depend on the purescript library to parse queries. Do you think it would be possible to have something like this instead?

data Language = Language
  { languageParseQuery :: String -> Either ParseError Query
  , languageRenderQuery :: Query -> TagStr
  ...
  }

Then, I could write a package hoogle-purescript which depended on both hoogle and purescript, and provided a Language value which could be used with Hoogle.

Of course, you would then have to export constructors for the Query type, or do something similar, which I can understand if you didn't want to do.

It would be nice if it were possible to create a database without using the filesystem. Currently I think you have to call createDatabase and then loadDatabase straight afterwards?
Producing the Hoogle input file is quite awkward and error-prone. Would it be possible to provide some way of constructing ItemEx values directly in Haskell code? Perhaps by exporting those types and constructors too? Then I can use Haskell's type system to help check whether I've created a valid Hoogle input file. Hoogle could then provide some way of serializing and deserializing these, or even allow users to put ItemEx values straight in to databases without having to serialize them?
The Result data type is a bit confusing (particularly the locations bit), and also includes Hackage URLs and seems to produce results according to Hackage's URL structure. Would it be possible to make the result data type independent of any URLs or URL structure, so users would be able to construct URLs themselves? For example, would something like this work?

data Result = Result
  { resultPackage :: String
  , resultTagStr :: TagStr
  , resultInfo :: ResultInfo
  }

data ResultInfo
  = PackageResult
  | ModuleResult      String -- ^ Module name
  | DeclarationResult String String -- ^ Module name & declaration title

I just had a quick look at the Hoogle module in Hoogle 5 and it seems at least some of these things are already happening, but I thought it would be useful for me to write this up from the point of view of someone coming from Hoogle 4 anyway.

ndmitchell commented 9 years ago

Thanks for the notes. Yep, some of those are covered, but great to get the list. I'm not sure if purescript would be fine as a dependency or not, it doesn't seem that big relative to the web stuff I depend on.

ndmitchell / hoogle

PureScript search? #102