ndmitchell / hoogle

Haskell API search engine
http://hoogle.haskell.org/
Other
738 stars 134 forks source link

Can hoogle builds its database from anything different from hackage.haskell.org? #51

Closed adinapoli closed 10 years ago

adinapoli commented 10 years ago

Hi Neil,

we would like to run our own private hoogle in production, indexing data from our own hackage server (which, comprehensibly has uploaded our own private source projects).

Said that, skimming through hoogle code, it seems to me that "hackage.haskell.org" is hardcoded in various places (e.g. src/Hoogle/Language/Haskell.hs and src/Recipe/All.hs), but to achieve what I'm asking should be possible somehow to pass the url via command line (in the same fashion we can pass the port number in hoogle server, just to give a tangible example). Am I making sense? If yes, where do you think we should start to achieve that? src/CmdLine/Type.hs seems a good place to start, isn't it?

Thanks! Alfredo

ndmitchell commented 10 years ago

Hi Alfredo,

So Hoogle can build data for other packages, but it isn't as thoroughly tested a route. If you have well formed Haddock output, you can use hoogle convert to convert a Haddock file, and hoogle combine to merge them. Usually you'd do convert on each package in turn, then combine at the end to create a default database. Unfortunately, if you're just running a local Hackage mirror, the text files that come out aren't particularly correct, and need munging, which is what hoogle data does, in addition to driving the whole process.

It sounds like adding a --hackage=... flag to src/CmdLine/Type.hs seems a reasonable approach, and I'd very happily accept a pull request. Alternatively, if you guys configure your Cabal preferences file to use a different server, I'd be happy to have Hoogle read that to find what the default server should be, so it "just works".

Thanks, Neil

adinapoli commented 10 years ago

Hi Neil,

thanks for the quick reply. For the sake of prototyping I've hacked the code to hardcode and hackage url everywhere, and as expected it worked. The only problem was the function indexing, for which you require to call "hackage data all": for some reason the process was hanging on a package throwing an exception of hGetContents (but that is a separate issue). As regards the present one, seems really easy to add such a flag, so perhaps I'll have a look at the weekend (we decided to park off the hoogle setup for now, focusing on business features), but on the long run it would be a really nice to have. Keep me posted if you want me to fire up a separate issue as regards the index building, so we can try to figure out what's going on :)

Thanks again! Alfredo

ndmitchell commented 10 years ago

Usually errors with hGetContents are due to unicode characters or UTF8 encoding somewhere I didn't expect it. If you can give me enough data to reproduce it I'll give a try and tracking it down. Likely running the hoogle data all with print statements should track down where its happening.

I look forward to a patch for the --hackage flag, or if you don't get round to it, I'll likely add it myself at some point.

adinapoli commented 10 years ago

I'll take a crack at it probably during the weekend or later in the week, I'll keep you posted :)

Thanks again!

Alfredo