Open Ericson2314 opened 4 days ago
What utility would that offer? It certainly would incur a high complexity toll on the code base.
Well have you seen this https://github.com/haskell/happy/blob/4b59fb3ed20ff92b6f2b5ed1c9df81abe007603c/lib/grammar/src/Happy/Grammar.lhs#L120-L142 ?
in particular
In hindsight, this was probably a bad idea.
Makes me chuckle. Fin
would remove a major cost from this, as Fin (a + b + c)
won't get mixed up with Fin a
, Fin b
, or Fin c
.
My trying to turn Name
into a mere newtype in https://github.com/haskell/happy/pull/223 and somehow getting <<loop>>
also makes me think the current code is quite fragile.
Also, it gives us a way to pull some metadata out of band. I am thinking of something like
data Grammar eliminator
= Grammar {
productions :: [Production eliminator],
lookupProdNo :: Int -> Production eliminator,
lookupProdsOfName :: Name -> [Int],
token_specs :: [(Name, TokenSpec)],
terminals :: [Name],
non_terminals :: [Name],
starts :: [(String,Name,Name,Bool)],
types :: Array Int (Maybe String),
token_names :: Array Int String,
first_nonterm :: Name,
first_term :: Name,
eof_term :: Name,
priorities :: [(Name,Priority)]
}
to
type Name nStarts nTerms nNonTerms = Fin (4 + nStarts + nTerms + nNonTerms)
data Production = Production
(Fin nNonTerms)
[Name nStarts nTerms nNonTerms]
-- IIRC these are indices of the previous list,
-- so could do more dep types
[Word]
Priority
data Grammar nStarts nTerms nNonTerms
= Grammar {
-- No longer need eliminator type parameter
productions :: Vector nProds (Production nStarts nTerms nNonTerms),
lookupProdsOfName :: Map (Fin nNonTerms) (Set (Fin nProds)),
starts :: Vector nStarts (Name, Bool),
priorities :: Map (Fin nNonTerms) Priority
}
which is the core formal language part (?)
and
data GrammarExtras nStarts nTerms nNonTerms
= GrammarExtras {
prod_elims :: Vector nNonTerms Text,
token_elims :: Vector nTerms TokenSpec,
start_names :: Vector nStarts Text,
start_names :: Vector nStarts Text,
types :: Vector nTerms (Maybe Text),
token_names :: Vector nTerms Text,
}
has all the extras bits for human legibility and/or codegen, with guarantees from the type parameters that everything has the right arities.
From what I remember, I don't like the overhead of dependent stuff in Haskell either, but I am not sure how else lock down the data structures without changing around the terms more.
For example, one can do zip all lists/arrays of the same arity to ditch the coherence conditions, and use "expansion points" as needed to factor out the extra info", and that works pretty well, but the various index types we need to store (because it's a graph, at the end of the day) are not helped by this.
In hindsight, this was probably a bad idea.
Makes me chuckle. Fin would remove a major cost from this, as Fin (a + b + c) won't get mixed up with Fin a, Fin b, or Fin c.
But you ultimately need Fin (a + b + c)
because they all index the same array.
I still don't see the utility of using Fin n
specifically. Why not SpecialName
, TerminalName
, NonterminalName
? You would still need to know how many of each sort there are in order to compute array indices, but that should work just fine.
It's pretty easy to check that no Name
ever generated is out of bounds at generation time; no need to prove it in the type system (which would be very awkward to do).
Fair enough, I was thinking of https://github.com/haskell/happy/pull/223 as a stepping stone already, and you are rightly pointing out there are more newtypes we can create as further stepping stones.
Happy currently has a lot of "untyped" array indexing, where nothing stops one using an index for the wrong of e.g. tokens, non-terminals, terminals.
I haven't done finite sets in Haskell for a while, so not sure the state of things these days (still
KnownNat
and friends I suppose?), but I wonder if we should try using them. It would be a lot less scary to refactor the code if these indexes wouldn't be confused for one another.