ChrisPenner commented 2 years ago

Names & PPEs

Unison's unique design choices also present unique challenges.

The AST doesn't store names for external definitions (since these names can change any time), so when parsing and printing we need to resolve names dynamically.

Glossary:

Name: A list of textual name segments paired with whether those segments are relative (to some unknown path) or absolute. It's worth noting that the name segments are stored in a linked list in reverse order. E.g. base.List.map is stored as ["map", "List", "base"]
Names: a two-way Relation between Names and refs (either Referent or Reference for Terms and Types respectively)
PrettyPrintEnv: a pair of functions from Referen(ce|t) -> Maybe Name, one for each of types and terms. It's a function so its contained names cannot be enumerated.
PrettyPrintEnvDecl: a pair of PrettyPrintEnv's, one which is suffixified and one which is not. It is so named because declarations must not be suffixified, since they need to accurately represent the location of the definition within the current working directory. The name is misleading, since Decls often refer to type declarations, but in this case it really means all definitions, e.g. thing.x = 1
Suffixification: It is desirable for names to be as short as possible, but while still being unambiguous (to both readers and the parser), so when pretty-printing names we often suffixify them. This means we choose the shortest unambiguous suffix of a name within the current scope.
Relativization: Making names or paths relative to a specific location. E.g. .base.List.map relativized to .base becomes List.map.

Where do we use Pretty Printers?

PrettyPrintEnvs are used, unsurprisingly, when pretty printing.
Used when rendering types, terms, docs, etc.

Where do we use Names?

Names are often used when looking up corresponding refs for a Name, e.g. when parsing. However, they are also sometimes used in the place of a pretty printer to get Names for refs; Notably, a Names object can return all names for a ref, whereas currently a pretty printer can only return the best name.

How do we construct `Names`

From a Branch, this branch might be the root, or it might be a namespace at a particular path.
From the term/type lookup index (on share-next)
From a set of Refs pulled from terms/patches, by searching history until names are found.
From names defined within a given unison file

There's often a distinction between pretty names and parse names.

Pretty names usually consists of relativized versions of all names in the current scope, as well as absolute names for everything outside of the current scope. Sometimes however it may include BOTH relativized and absolute names for things in the current scope.

Parse names consist of relative names within the provided scope, it may optionally include absolute versions of ALL names, including those in the current scope.

Relative vs Absolute

One source of confusion is that the contents of Names objects is often a mix of Relative and Absolute names, there's often little rhyme or reason behind the choice, and it's tricky to keep track of.

This makes it more difficult to do things like restrict a set of names to a given path, or to prioritize names close to a given location, since one needs to handle any combination of relative/absolute names.

When names are constructed directly from a unison file the names are, in some sense, neither absolute nor relative since they don't yet exist in the codebase.

Picking good names

A big challenge we face is that of picking good names when pretty-printing, for some definition of good.

Factors which affect goodness:

If the user asked to see a specific definition, we should ensure that's the name used for that definition.
We should prefer names within the current perspective, but failing that, should still show an external name if we have one (?)
We should prefer names in the current root namespace, but should fall-back on historical names if needed, to avoid showing a bare hash (?)
We should prefer short, but unambiguous names when possible.
We should prefer names which are located near the location of the definition the user asked to be printed.
Pretty-printed expressions must still parse & typecheck, meaning names must be unambiguous, and names on the LHS of top-level bindings must be appropriately qualified to our perspective.
Generally we should avoid conflicted names, except when we're printing TODOs, in which case conflicted names are exactly what we want.

Due to the fact that "good" changes depending on the use-case it can be helpful to allow callers to customize the prioritization of their PPE on the fly.

Issues with the current system

We carry around suffixified & unsuffixified ppe's and it's not always clear when to use which.
It's generally unclear whether a given Names object contains relative names, absolute names, or a mix of both.
When ppes and Names include relative names, it's not always clear what they're relative to.
Certain prioritization heuristics (e.g. prioritize names near a specific name) can't be applied unless we know where all names are relative to, since they do prefix matching.
Things like suffixification must be applied when creating the pretty-printer, and destructively remove information about the names which affect steps like biasing; e.g. if there are names: List.map#abc and MyList.mapMe#abc, they'll be suffixified to map and mapMe, but now we don't have the ability to prioritize names inside MyList if that's our current perspective, that info is lost.
Because biasing must happen before suffixification, and because suffixification is dependent on a given Names object, and because that Names object is dependent on the current perspective, we end up needing to re-create giant Names and PPEs any time the perspective changes, whether we want suffixified names or not, and any time we want to re-bias for a different definition (e.g. when we run view blah). This is pretty wasteful.
We sometimes use Names in places where we're effectively pretty-printing, but not always
Pretty Printers are currently pure; which means we need ALL names available at time of their creation. This is costly, especially on share-next's server. In practice, we likely don't need all names in scope to resolve ref -> Maybe Name if it were monadic instead; e.g. ref -> m (Maybe Name)

Questions

Do parse names and pretty-printers still need to include names outside of the current scope? Or is that explicitly discouraged now?
What is the current perspective on NamesWithHistory? Currently it's barely used, is expensive to compute, and could be just as easily represented as a single Names object which was created with a biased union.

Proposed Changes

Drop the NamesWithHistory type entirely, if we decide we always want historical names, integrate it into the PPE type, but otherwise we can simply add a combinator ppeWithHistory which takes two Names objects.
Drop the PrettyPrintEnvDecl and integrate suffixification as a first-class parameter in the underlying PPE type.
Always use PPEs for pretty printing, rather than sometimes using Names.
Always create PPEs from fully Absolute Names, this ensures that biasing and relativization is simple and always works. If we want to restrict names to a specific path, we can add another Restrictions parameter.
Keep unmodified versions of names alongside the modified versions such that choices for effective prioritization and restriction can occur even after suffixification etc.
Alter PPE type to something like:

data PrettyPrintEnv = PrettyPrintEnv
  { -- names for terms, constructors, and requests
    termNames :: Maybe Path -> Suffixify -> [Name] -> Referent -> [(HQ'.HashQualified Name, HQ'.HashQualified Name)],
    -- names for types
    typeNames :: Maybe Path -> Suffixify -> [Name] -> Reference -> [(HQ'.HashQualified Name, HQ'.HashQualified Name)],
    -- allows adjusting a pretty-printer to a specific perspective. Names within this perspective will be made relative.
    perspective :: Maybe Path,
    -- allows biasing returned names towards specific locations. 
    -- Names are automatically biased towards the current perspective if one is provided.
    biases :: [Name],
    -- Whether to shorten names to a minimal unambiguous suffix (within the current perspective) or not
    suffixify :: Suffixify
  }

It now returns a list of names in priority order, including the absolute name and the pretty, possibly relativized, possibly suffixified name

This supports a future change to a monadic pretty-printer, since a monadic version would likely need all parameters to be provided on each lookup so it can be as efficient as possible and not do any extra work.

It also allows altering a given PPE's parameters without needing to rebuild the whole thing from scratch. This means we can always share a single, global PPE and just alter its parameters for each pretty-printing task, saving us the work of re-building every time.

We can be careful to make PPE builders take advantage of currying to get as much sharing as possible and avoid re-computing the entire PPE when changing parameters. E.g. changing the biases requires only re-sorting the returned list, we don't need to re-suffixify or re-relativize or anything.

aryairani commented 2 years ago

The proposed changes sound good to me, though we should review the sample proposed PPE type to decide if it's trying to do too much.

aryairani commented 2 years ago

Generally we should avoid conflicted names, except when we're printing TODOs, in which case conflicted names are exactly what we want.

Not sure we want to avoid conflicted names, it might be what you want even outside of TODOs?

mitchellwrosen commented 2 years ago

we end up needing to re-create giant Names and PPEs any time the perspective changes

Does this mean when we cd we pay this big cost?

aryairani commented 2 years ago

We sometimes use Names in places where we're effectively pretty-printing, but not always

Out of curiosity, where are the spots that this is the case?

aryairani commented 2 years ago

Do parse names and pretty-printers still need to include names outside of the current scope? Or is that explicitly discouraged now?

For parsing scratch files, we currently aren't including names outside of the current scope. But for parsing REPL commands, I would think we should. e.g. alias.term .foo.bar.baz baz. Does this depend on that? I could see accepting absolute names in parsing too (why not?). But not included in suffix-based resolution. And it's okay to not accept them in parsing scratch files until we know more.

For pretty-printing, if we encounter a hash that doesn't have a name in the current scope, what should we do? a) show a bare hash, b) show an absolute name? The hashes aren't useful information; so if we can reasonably provide some useful information instead, we should.

aryairani commented 2 years ago

I would recommend Bias instead of just [Name] for biasing.

aryairani commented 2 years ago

A question about the proposed PrettyPrintEnv type:

What does its lifecycle look like? We note that it includes fields for perspective, bias, and suffixify setting, but also includes functions that accept those as arguments, so how is it all meant to work?

ChrisPenner commented 2 years ago

Some more questions that have come up during reviews:

When will it be acceptable to stop including Absolute names when pretty-printing/parsing? Can we do it now?
Why does the backend PPE include global absolute names in 'parse-names'?
How can we get the backend PPEs to work as close as possible to the UCM ones?
How should view using an absolute name work? Esp. in the context where we don't want the root branch loaded all the time.
Is it possible to remove the PrettyPrintEnvDecl entirely with a few tweaks to the PPE? (likely yes)
Is it possible to unify the concepts of parseNames and prettyNames? (likely yes)
Can we remove the fallback mechanism if we no longer include absolute names? (likely yes)

unisonweb / unison

An Audit of Names and PPEs #3256

Names & PPEs

Where do we use Pretty Printers?

Where do we use Names?

How do we construct `Names`

Relative vs Absolute

Picking good names

Issues with the current system

Questions

Proposed Changes

unisonweb / unison

An Audit of Names and PPEs #3256

Names & PPEs

Where do we use Pretty Printers?

Where do we use Names?

How do we construct Names

Relative vs Absolute

Picking good names

Issues with the current system

Questions

Proposed Changes

How do we construct `Names`