ndmitchell / hoogle

Haskell API search engine
http://hoogle.haskell.org/
Other
738 stars 134 forks source link

Generating database from ghc-pkg doesn't work #194

Closed iliastsi closed 7 years ago

iliastsi commented 7 years ago

Running hoogle generate --local fails with:

Starting generate
Reading ghc-pkg... hoogle: fd:12: hGetContents: invalid argument (invalid byte sequence)

Instead, the following works:

$ ghc-pkg dump|grep haddock-html|tail -n1
haddock-html: /usr/share/doc/libghc-text-doc/html/
$ hoogle generate --local=/usr/share/doc/libghc-text-doc/html/
Starting generate
[1/1] text... 0.14s
Reodering items... 0.00s
Writing tags... 0.00s
Writing names... 0.00s
Writing types... 0.00s
Took 0.15s

Please let me know if you need more information and/or debugging info. Thanks!

ndmitchell commented 7 years ago

Hmm, that's not great. If you do ghc-pkg dump > log.txt does that work? Are there any higher-ascii characters in the output of ghc-pkg? I believe I should be using binary streams to read in the output of ghc-pkg, so it's somewhat weird.

ndmitchell commented 7 years ago

See also https://github.com/commercialhaskell/stack/issues/2582, which seems to either be the same issue, or a related one. Something going weird in this area.

iliastsi commented 7 years ago

Are there any higher-ascii characters in the output of ghc-pkg?

Yes, two of the modules currently installed, namely unix-compat and QuickCheck, have Björn Bringert as author. If I try and remove the above modules, hoogle works as expected. It seems hoogle cannot handle non-ascii characters when reading the output of ghc-pkg dump.

ndmitchell commented 7 years ago

Which version of the process library are you using? From what I can tell, the output from ghc-pkg dump is captured as a binary format. What is your OS? If Windows, is it a "localised" version, e.g. Greek/Korean etc, rather than English/US?

iliastsi commented 7 years ago

Sorry for the late reply. I have process-1.4.2.0 on Debian testing (with GHC version 8.0.1). According to haskell/process#59, the process library does not support binary IO.

Indeed, the following doesn't work

import System.Process

main = do
  stdout <- readProcess "ghc-pkg" ["dump"] ""
  putStr stdout

(fails with hGetContents: invalid argument (invalid byte sequence)), whereas the following works:

import System.Process.ByteString
import qualified Data.ByteString as BS

main = do
  (_, stdout, _) <- readProcessWithExitCode "ghc-pkg" ["dump"] BS.empty
  BS.putStr stdout

Have you tried reproducing this? It should be reproducible once you install one of the unix-compat or QuickCheck libraries (which contain non ASCII metadata).

ndmitchell commented 7 years ago

What is your $LANG variable set to?

iliastsi commented 7 years ago

My $LANG variable was empty (this was a chroot environment). I set it to en_US.UTF-8 and hoogle worked like a charm :)

Thanks for looking into this. I am closing this bug report since it now works for me, but if you feel like there is something missing, fell free to reopen it.

ndmitchell commented 7 years ago

I want it to work under all circumstances - you aren't the first person to run into it and won't be the last. I can now reproduce, so I'll track it down.

ndmitchell commented 7 years ago

I think this is now fixed in HEAD, by using System.Process.ByteString as you suggested.

ndmitchell commented 7 years ago

Released as Hoogle-5.0.8.