haskell / haddock

Haskell Documentation Tool
www.haskell.org/haddock/
BSD 2-Clause "Simplified" License
361 stars 241 forks source link

Location information of instances is lost if there is aleady a '.hi' #921

Open harpocrates opened 6 years ago

harpocrates commented 6 years ago

If there is already a '.hi' file present for a module we are haddock-ing, none of the instances in that module will have any location information.

I initially pointed this out in https://github.com/haskell/haddock/pull/906#issuecomment-414787805. I'm making a full blown issue for this because it is turning out to be a bit nastier than I expected.


The third field of DocInstance (the Located (IdP name)) is where we have a disappearing location:

https://github.com/haskell/haddock/blob/133e9c2c168db19c1135479f7ab144c4e33af2a4/haddock-api/src/Haddock/Types.hs#L392

This is supposed to glean its location off of dfun_id (or co_ax_name for type families, although that path is not well documented). There is a Note [Haddock assumptions] referenced from that field:

Note [Haddock assumptions]
~~~~~~~~~~~~~~~~~~~~~~~~~~
For normal user-written instances, Haddock relies on

 * the SrcSpan of
 * the Name of
 * the is_dfun of
 * an Instance

being equal to

  * the SrcSpan of
  * the instance head type of
  * the InstDecl used to construct the Instance.

However, when serializing Names in interface files, GHC discards all locations. When reading the interface back out, the location holes get filled with noSrcSpan. All that to say that this note no longer holds if the class instance was read from an interface file.

harpocrates commented 6 years ago

I think we will want to fix this by explicitly serializing location information for these "special" Names. I've tried this out in https://github.com/harpocrates/ghc/commit/07f6785525d0bdfbd1b5350a89080db5e0ded111 and it definitely helps preserve locations for Haddock. The problem is that this is going to change error messages / GHCi output.

Before:

Prelude> instance Eq Int where {}

<interactive>:1:10: error:
    Duplicate instance declarations:
      instance Eq Int -- Defined at <interactive>:1:10
      instance Eq Int -- Defined in ‘GHC.Classes’
Prelude>

After:

Prelude> instance Eq Int where {}

<interactive>:1:10: error:
    Duplicate instance declarations:
      instance Eq Int -- Defined at <interactive>:1:10
      instance Eq Int -- Defined at GHC/Classes.hs:256:10
Prelude>

Maybe the fix is just going to be to patch pprDefinedAt. Nothing I've tried has really left me satisfied though...

harpocrates commented 6 years ago

Actually, it looks like we might need even more SrcSpans: the Missing documentation for: warnings are also suffering from missing locations.

sjakobi commented 6 years ago

Thanks for making a proper issue!

What do we use the SrcSpan in DocInstance for? If it's for some heuristic, maybe we can encode the necessary information in some other way?

Actually, it looks like we might need even more SrcSpans: the Missing documentation for: warnings are also suffering from missing locations.

Right. I had recorded that issue in https://github.com/sjakobi/haddock/issues/4. I don't think this is too grave though. We can still output in which module the docs are missing.

@alexbiehl's idea was to let only the extended .hie files encode locations. We can probably make it so that when you use --hyperlinked-source, warnings are output with source locations.

harpocrates commented 6 years ago

What do we use the SrcSpan in DocInstance for? If it's for some heuristic, maybe we can encode the necessary information in some other way?

The SrcSpan is the key for InstMap. The SrcSpan uniquely identifies the instance. For this case at least, it seems like serializing instance location information in .hi files is the easiest and fastest approach.

For pretty much everything else, putting locations in .hie files sounds like a great idea.

sjakobi commented 5 years ago

Would you mind putting https://github.com/harpocrates/ghc/commit/07f6785525d0bdfbd1b5350a89080db5e0ded111 on Phabricator, separately from D5047, Alec?

It would cause so much churn in the testsuites that I'd rather not mix it with the rest of the Hi Haddock changes.

harpocrates commented 5 years ago

I'm gonna try fiddling around with pprDefinedAt a bit more first, but then I'll put it up.

harpocrates commented 5 years ago

I'm sorry I've let this lag for so long. I've just got a hunch that perhaps this is all avoidable. I just need to find a couple hours to load back into memory the end-to-end story for why we need SrcSpan to convince myself this really is necessary (or find an alternate path).

Do holler if this is blocking.

harpocrates commented 5 years ago

I finally got several consecutive hours to think this through properly. Apologies for the block of text: I want to avoid forgetting and re-discovering this again later.

In today's Haddock

Unlike other declarations, Haddock never looks at InstD (or DerivD). Instead, it queries GHC to get complete instance lists (via attachInstances, md_fam_insts, and md_insts/modInfoInstances) in the form of [FamInst] and [ClsInst].

GHC's FamInst and ClsInst conveniently have unique names (that look like $fEqBool), which gives Haddock names under which to key docs attached to instances. What is missing is a way to match up InstD/DerivD’s and FamInst/ClsInst's so that we can get docs (which are matched up to InstD/DerivD's) associated with the right FamInst/ClsInst.

The invariant around SrcSpan (see Note [Haddock assumptions] in GHC) is a way around that: it is something that FamInst/ClsInst promise to uphold so that they can be easily matched up with the InstD/DerivD’s from which they came.

About the idea of forcing FamInst/ClsInst to serialize location info (for Hi Haddock)

We don’t need ClsInst/FamInst to actually serialize the location info - we just need the invariant to hold for ClsInst/FamInst from the module we are currently compiling.

We can get rid of the instanceMap on the Haddock side completely (and Haddock-side calls to getInstLoc in the process) - it is used only in mkVisibleNames and I’m 90% sure that code path is unreachable. (I tried replacing instanceMap with error “UNEEDED” and nothing crashed - more testing/reasoning is needed here).

All this to say: my proposed GHC change from https://github.com/haskell/haddock/issues/921#issuecomment-416278563 is missing the point.

We do need locations for those FamInst/ClsInst though...

The code in the XHTML backend uses some locations to generate links to source pages which may or may not end up being generated by the hyperlinker backend (so we shouldn't count on .hie files to save us).

I propose that we add one extra field into the GHC-side data Docs type: a decl_locs :: Map Name RealSrcSpan for putting the locations of (at least)

I think this will solve both the initial bug for which I opened this issue and the open TODO about splices.


How does all this sound?

sjakobi commented 5 years ago

This sounds good to me even though my understanding of the instance handling is rather fuzzy.

In fact we already used to serialize this data before we (mistakenly) decided that we'd only the location info from .hie-files. On the GHC side I found this commit that added a bit more info than we now need.

EDIT: For the Haddock side I found https://github.com/sjakobi/haddock/commit/d3de1a5416b9251590d7d168622881b3064b9fe2. https://github.com/sjakobi/haddock/commit/811a57 might also be interesting.