google / haskell-indexer

Emits code crossreference data for Haskell sources.
99 stars 20 forks source link

Folding sift into haskell-indexer #79

Open chrisdone opened 6 years ago

chrisdone commented 6 years ago

Hi,

We're currently reviewing haskell-indexer for use on auditing work. I wrote a tool called sift which is capable of generating a simple cross-package call graph of a haskell package and writing it to a .json file. Example:

$ sift trace sift-bindings/*/* --flag-binding "ghc-prim GHC.Prim raise#" --call-trace | head -n 30
Flagged binding: ghc-prim:GHC.Prim.raise#
  Used by aeson:Data.Aeson.Encoding.Builder.day
  Call trace:
  aeson:Data.Aeson.Encoding.Builder.day
  |
  +- base:GHC.Real.quotRem
  |  |
  |  +- base:GHC.Real.divZeroError
  |  |  |
  |  |  `- ghc-prim:GHC.Prim.raise#
  |  |
  |  `- base:GHC.Real.overflowError
  |
  `- base:GHC.Err.error

  Used by aeson:Data.Aeson.Encoding.Builder.digit
  Call trace:
  aeson:Data.Aeson.Encoding.Builder.digit
  |
  `- base:GHC.Char.chr
     |
     `- base:GHC.Err.errorWithoutStackTrace
        |
        `- base:GHC.Err.error
           |
           `- ghc-prim:GHC.Prim.raise#

Preferably, we'd like to use haskell-indexer to achieve the same thing instead of maintaining two codebases.

I think I can get the same info from TickReference and Tick and XRef gives me a list of TickReferences and also a list of Relation.

I think from there I can produce a graph with Data.Graph by producing a list [(nodeid,node,[nodeid])] where the latter list is "my dependencies", like I do here

-- | Graph all package bindings.
graphBindings ::
     Set Binding
  -> OrdGraph BindingId Binding
graphBindings bs =
  ordGraph (map
              (\binding -> (binding, bindingId binding, bindingRefs binding))
              (Set.toList bs))

then I can produce a simple call graph like this:

callTrace :: OrdGraph BindingId node -> Graph.Vertex -> Graph.Vertex -> [Tree [Char]]
callTrace g start flagged =
  fmap
    (fmap
       (\v' ->
          let (_, bid', _) = ordGraphVertexToNode g v'
           in S8.unpack (prettyBindingId bid')))
    (filterForest
       (flip (Graph.path (ordGraphGraph g)) flagged)
       (Graph.dfs (ordGraphGraph g) [start]))

So I'm 90% confident I can fairly readily get the information I need to obsolete the sift tool. I have some questions:

  1. Do you have any intention of penetrating class instances so that we could, for example, determine that throw# is used by read "x" :: Int because the method instance for Int uses error? We need this for auditing, aside from it being a super cool feature in general.
  2. It seems that there isn't support for base? What's your approach on that? We need this for auditing. Here's how sift tackles the tooling issues:
    • sift-frontend-plugin is used when building GHC's lib dir, on the base package specifically. You just inject --frontend in the right place after building sift-frontend-plugin in the same package set. That lets you generate a profile of base. If you had trouble with this on haskell-indexer, maybe I can help out getting that to work.
    • sift-compiler - this is a copy/paste of the commandline interface from intero and ghci, and on start it immediately generates a profile of all the modules. This lets you do stack ghci --with-ghc sift-compiler and then you're done. I believe cabal repl --with-ghc sift-compiler would also work, but I haven't tested it. Why not just stack ghci --with-ghc ghci --ghci-options '--frontend Sift.FrontendPlugin'? Because GHC rejects frontend plugins when used with --interactive. :man_shrugging:
  3. Finally, would you accept a PR that would contain some code for the above code tracking? It'd be nice to fold my code into haskell-indexer, and maybe have a UI that could display a call graph interactively.
mpickering commented 6 years ago

Hi Chris,

The way that haskell-indexer works is that it populates the kythe database which is then used by language-agnostic tooling to perform querying and visualisation. So the recommended way in order to perform graph queries is to use something like cayley rather than trying to use the Haskell implementation.

There is also already a UI for displaying code which works by reading the kythe database. http://stuff.codereview.me/lts/9.2/#basic-prelude-0.6.1.1/BasicPrelude.hs?corpus&signature

I'm not sure that Robin has considered indexing base yet. I haven't thought about it.

I am working right now on seeing if the indexer can be implemented as a source plugin which will mean that it will be easy to use with any build tool just by passing -fplugin=HaskellIndexer.
It also needs to be updated to support GHC 8.4 and GHC 8.6.

robinp commented 6 years ago

@chrisdone: Thanks for bringing this up, indeed seems like a nice usecase! It reminds me of https://hackage.haskell.org/package/weeder, which I thought could also be massaged to work on the indexer data.

While @mpickering is right that Kythe is the main (and so far only) frontend, we have the middle translation layer for exactly the purpose of exporting in other formats if needed. That said, going from the Kythe data could have further benefits, biggest one (from audit point) is that when we'll have FFI crossreferencing (https://github.com/google/haskell-indexer/issues/18), you could query the graph cross-language.

Cayley: I was playing with it in the past, see https://gist.github.com/robinp/b3f6057d7123ca19866b4fb28fbb50d1 for an example (and https://github.com/cayleygraph/cayley/blob/master/docs/GizmoAPI.md for the graph API).

Base: hm, I vaguely remember, at least with stack wrapper, that I could get base indexed. But maybe I misremember, let me check back on this later. There was something about passing a GHC flag to hide base.

Instance method calls: this is discussed in https://github.com/google/haskell-indexer/issues/73, but GHC API doesn't make this easy. Would certainly be cool!

PRs are certainly accepted, but let's just make sure we agree on the path forward before substantial code is written. Thanks a lot!

robinp commented 6 years ago

@chrisdone as mentioned in #83 above, currently cabal new-build could be used to build base, but it doesn't have an off-the-shelf wrapper script. There's #13 open about that, if interested (might be used to get base and other low-level packages indexed with cabal, and the rest with stack).

As for plugin approach: for future-proofing, I would prefer a Source Plugin instead a Frontend one. @mpickering, how far do you see that happen (in case you made significant progress)?

As for why preferring a source plugin - I vaguely remember going through GHC sources, and found that while using a FE plugin instead of GHC API, one can get a much more similar environment as to a regular GHC execution (GHC API omits many knobs that only happen in GHC's Main), but FE plugin was still not 100% regular GHC. As I understand, Source Plugin should really be 1:1, though I didn't check.

mpickering commented 6 years ago

Implementing haskell-indexer as a source plugin is blocked by proto-lens currently.

qrilka commented 6 years ago

@mpickering could you elaborate a bit? What are the problems with that library?

mpickering commented 6 years ago

It doesn't build on ghc-8.6.1 and source plugins are only available in ghc-8.6.1. https://github.com/google/proto-lens/issues/236