ndmitchell / hoogle

Haskell API search engine
http://hoogle.haskell.org/
Other
738 stars 134 forks source link

Parsing of ghc-pkg output by Input.Cabal.readGhcPkg is broken #350

Open jchia opened 4 years ago

jchia commented 4 years ago

Different versions of GHC have different output format for ghc-pkg. For example I got different output under GHC 8.2.2 vs GHC 8.8.3: https://gist.github.com/jchia/cf4412d7402f7bfeb01324b9cd537986

One way the brokenness manifests is that with GHC 8.8.3, hoogle generate thinks that the packageDocs is Just "", and that the package name has some leading spaces, resulting in hoogle generate failing with "No packages were found, aborting (use no arguments to index all of Stackage)" because it thinks that none of the packages have documentation.

What about using Distribution.InstalledPackageInfo from Cabal? It may be a big change, but this way is less fragile than internal ad-hoc parsing, and I don't know what other problems there are already that result from broken parsing.

ndmitchell commented 4 years ago

Cabal changes API with each release, has a pretty terrible API, and tends to get upgraded in lockstep with GHC (apart from when it's not). I've depended on Cabal in the past, and it's been a nightmare, so while it might be less fragile in terms of runtime behaviour, my guess is it more than makes up for it with a fragile install experience :(

jchia commented 4 years ago

I can see how it would be painful if each release of Cabal has backward-incompatible changes that you have to keep up with. You could say you'll stick to one version of Cabal but users would want to build and use hoogle with newer versions so someone would need to do the keeping up by maintaining code that works with different versions of Cabal.

One way has code that's more trustable but is harder to maintain wrt Cabal revisions and the other way has code that's less trustable and probably needs more parsing tests under different versions of Cabal extracting different fields but may be easier to maintain.

I'm not familiar with Cabal so I can't say which way is better. Changing the code for the first way or adding a lot of tests for the second way both seem to involve a lot of code.

ndmitchell commented 4 years ago

The parsing code has broken once since it was written, so it's pretty solid, and a little bit of CI is probably sufficient. The problem with GHC 8.8 is that the Windows versions are completely broken, which is why there isn't more testing of Hoogle in that configuration, otherwise I suspect we'd have spotted sooner.

You can't stick to one version of Cabal since it comes shipped with GHC, which makes upgrading a pain, but some people do anyway, so you need to support them all.

jchia commented 4 years ago

The problem with GHC 8.8 is that the Windows versions are completely broken

I tried "hoogle generate" "hoogle test" on my Linux machine under stack with lts-15.9 before the fix from #351 and it passed. lts-15.9 has GHC 8.8. Does that mean more tests need to be added to "hoogle test" to test for the bug fixed in #351?

ndmitchell commented 4 years ago

The tests currently don't test the ghc-pkg branch at all - I agree they definitely should! Patches most welcome.