Statically enforce that all the logging options are documented

gallais commented 3 years ago

At the moment we have quite extensive (and everyday expanding) logging options to get Idris to print more debug output. However these are not documented anywhere and the best way to discover them is by grepping the source code. This is less than ideal.

In Idris.Env we use a nifty trick to ensure that all of the environment variables we attempt to observe in the codebase are thoroughly documented.

It would be nice to have the same enforcement for logging options.

NB: there are 450+ uses of log in the codebase so it'll take a bit of time to collect all of the existing topics. The documentation part can be left for a future PR: having an exhaustive list would already be pretty useful!

AdamBrouwersHarries commented 3 years ago

I think I could tackle this.

With the environment variables we don't need to deal with hierarchies of names, but as I understand it with logs we need to be able to deal with them (e.g. elab, elab.with etc). Do we want users to have to write out every possible "path" of log contexts/names, or do we want a smarter way of doing it (e.g. checking membership of a tree structure of some sort).

gallais commented 3 years ago

There are two aspects to this:

Declaring all of the topics of interest & enforcing the log functions can only be called with one such topic
Generating the documentation

I think you are right that it would be good to print the whole thing not as a list but as a hierarchical structure. We could do so by reusing (parts of) our existing functions in Core.Options.Log to generate a trie of documentation strings & printing it nicely.

It's probably worth writing this printer generically for any Libraries.Data.StringTrie

This trie could also be used to efficiently perform the check needed in the first part of the problem.

AdamBrouwersHarries commented 3 years ago

Declaring topics of interest is what I was specifically referring to, but generating documentation is something I'd not considered, and definitely something that should go into the design.

To clarify, what I was asking was whether "declaring" a topic of interest should be done in a "list like" way, i.e.

topics : List String 
topics = [
    "core", 
    "core.elab",
    "syntax"
]
-- etc

or if it should be done in a "tree like" way, i.e.

-- or equivalent structure
record LogTree where 
    constructor MkTree
    name : String 
    children : List LogTree

-- We can't have "" as the top level name, as otherwise we will require `.core.elab`
options : List LogTree
options = [
    MkTree "syntax" [], 
    MkTree "core" [
        MkTree "elab" [], 
        MkTree "data" []
    ]
]

I think there are good arguments for both - the former is a bit more readable I think, but the latter does make the tree a bit more explicit.

gallais commented 3 years ago

I think it should be done like this:

topics : StringTrie String
topics = foldr insert Trie.empty [
    ("core", "Anything related to core")
    -- ^     ^- the associated doc
    -- |
    -- a topic
    ...

where insert : (String, String) -> StringTrie String -> StringTrie String can be put together by using the topic-parsing logic we have in Core.Options.log together with the lib functions on tries.

AdamBrouwersHarries commented 3 years ago

How does this work for adding "subtopics"? Do we add them as discrete topics (e.g. core.elab), and then later (i.e. at query or doc generation time) break them into a topic "tree"?

gallais commented 3 years ago

insert into the StringTrie will break down "core.elab" into ["core", "elab"] (cf. the logic in parseLogLevel) and use that as the path to insert the documentation into the trie

Query should perform the same split & use StringTrie's lookup that focuses on the subtree pointed at by the path (apparently it's not part of the library yet: we need lookup : List String -> StringTrie a -> Maybe (StringTrie a)).

i.e. a topic is valid if the topic (or any subtopic) is documented. so elab is valid even if elab is not documented but elab.def and elab.data are, it just acts as a synonym for all of its subtopics.

AdamBrouwersHarries commented 3 years ago

Thanks! That helps clarify a lot. I'll look into StringTrie properly, as it definitely seems that I'm missing some knowledge there.

AdamBrouwersHarries commented 3 years ago

So I've had a hack and think I have a working-ish prototype. The only trouble is that I think Idris2' auto search is not deep enough to be able to effectively search within a trie structure. See the following gist for an example: https://gist.github.com/AdamHarries/2dc75d1efd574cdc405401bd8d3ddc15

Essentially, the type checker complains that it can't find an implementation for a successful search, but also (when flipped around) will also complain that it can't find an implementation for an unsuccessful search. I think it must be giving up part-way, but I'm not well versed enough to know what I can do top ask Idris2 to keep searching.

ShinKage commented 3 years ago

You can use the %auto_implicit_depth <n> pragma to change the maximum depth for auto implicit search, the default value is 50. Obviously this will increase compile times, and it may hang if the search requires to much memory, but at least you can experiment with it. In my experience, if the datatype you are trying to construct automatically isn't branching deeply you can increase it far enough.

gallais commented 3 years ago

This is just doing normalisation though. The proof is extremely shallow (a single Just constructor).

gallais commented 3 years ago

Is the problem that triefind is not seen as total and thus won't reduce in the type? Try adding a %default total at the top of the file?

AdamBrouwersHarries commented 3 years ago

Adding totality checking (either with %default total, or with explicit totals before triefind) didn't make any change. Additionally, bumping the auto implicit depth to 100,000 did nothing, which as you say should be expected.

AdamBrouwersHarries commented 3 years ago

looking at the logs, it fails with:

LOG eval.stuck:5: Stuck function: Libraries.Data.StringMap.empty

ShinKage commented 3 years ago

I think it's a visibility issue, all the StringMap and StringTrie functions are marked only as export which does not reduce at compile times, unlike public export.

AdamBrouwersHarries commented 3 years ago

Could you explain the difference between export and public export for me please? (Or point me somewhere which explains it).

ShinKage commented 3 years ago

Sure, take a look at the docs here https://idris2.readthedocs.io/en/latest/tutorial/modules.html#export-modifiers.

AdamBrouwersHarries commented 3 years ago

Cheers!

AdamBrouwersHarries commented 3 years ago

So that seemed to help the inlining/evaluation, but it is still failing to typecheck. It manages to get up to this expression before failing:

LOG compiler.inline.eval:10: Inlined: (%case !{arg:6} [(%concase Prelude.Types.Just Just 1 [{e:1}] (Treemember.triefind [___, !{arg:2}, !{e:1}])), (%concase Prelude.Types.Nothing Just 0 [] (%con Prelude.Types.Nothing Just 0 []))] Nothing)

gallais commented 3 years ago

This seems to make things work (you may have to add import Data.List to your module too: there is a bug in idris that prevent functions from reducing if they are not in scope)

diff --git a/src/Libraries/Data/StringMap.idr b/src/Libraries/Data/StringMap.idr
index 6b57705e..fc63eb4b 100644
--- a/src/Libraries/Data/StringMap.idr
+++ b/src/Libraries/Data/StringMap.idr
@@ -75,6 +75,7 @@ merge3 (Branch2 a b c) d (Branch3 e f g h i) j k = branch6 a b c d e f g h i j k
 merge3 (Branch3 a b c d e) f (Branch2 g h i) j k = branch6 a b c d e f g h i j k
 merge3 (Branch3 a b c d e) f (Branch3 g h i j k) l m = branch7 a b c d e f g h i j k l m

+public export
 treeLookup : Key -> Tree n v -> Maybe v
 treeLookup k (Leaf k' v) =
   if k == k' then
@@ -94,6 +95,7 @@ treeLookup k (Branch3 t1 k1 t2 k2 t3) =
   else
     treeLookup k t3

+public export
 treeInsert' : Key -> v -> Tree n v -> Either (Tree n v) (Tree n v, Key, Tree n v)
 treeInsert' k v (Leaf k' v') =
   case compare k k' of
@@ -124,6 +126,7 @@ treeInsert' k v (Branch3 t1 k1 t2 k2 t3) =
         Left t3' => Left (Branch3 t1 k1 t2 k2 t3')
         Right (a, b, c) => Right (Branch2 t1 k1 t2, k2, Branch2 a b c)

+public export
 treeInsert : Key -> v -> Tree n v -> Either (Tree n v) (Tree (S n) v)
 treeInsert k v t =
   case treeInsert' k v t of
@@ -204,20 +207,20 @@ data StringMap : Type -> Type where
   Empty : StringMap v
   M : (n : Nat) -> Tree n v -> StringMap v

-export
+public export
 empty : StringMap v
 empty = Empty

-export
+public export
 singleton : String -> v -> StringMap v
 singleton k v = M Z (Leaf k v)

-export
+public export
 lookup : String -> StringMap v -> Maybe v
 lookup _ Empty = Nothing
 lookup k (M _ t) = treeLookup k t

-export
+public export
 insert : String -> v -> StringMap v -> StringMap v
 insert k v Empty = M Z (Leaf k v)
 insert k v (M _ t) =
@@ -245,7 +248,7 @@ export
 fromList : List (String, v) -> StringMap v
 fromList l = foldl (flip (uncurry insert)) empty l

-export
+public export
 toList : StringMap v -> List (String, v)
 toList Empty = []
 toList (M _ t) = treeToList t
@@ -266,7 +269,7 @@ treeMap f (Branch2 t1 k t2) = Branch2 (treeMap f t1) k (treeMap f t2)
 treeMap f (Branch3 t1 k1 t2 k2 t3)
     = Branch3 (treeMap f t1) k1 (treeMap f t2) k2 (treeMap f t3)

-export
+public export
 implementation Functor StringMap where
   map _ Empty = Empty
   map f (M n t) = M _ (treeMap f t)

AdamBrouwersHarries commented 3 years ago

Ah, I avoided the first three public export (on the various tree insert etc functions) as I assumed that was part of a private API. That makes sense though.

Is the bug affecting Data.List, or the other functions from StringMap?

gallais commented 3 years ago

The Data.List import is so that split reduces. We had similar issues in #1086

AdamBrouwersHarries commented 3 years ago

Apologies for the radio silence - work has been rather hectic the past couple of weeks. I'm hopeful that I'll be able to make some progress on this at the weekend.

AdamBrouwersHarries commented 3 years ago

So I've had a crack at a bit of it, and made some limited progress, as seen here: https://github.com/idris-lang/Idris2/pull/1180

It's proving a little difficult to stick to the original design. Specifically, I've simplified and modified the proof burden to remove references to any other modules, as they require that any caller also import that module. e.g. if our function is foo : (arg : A) -> {auto proof : IsJust somecall} -> B, then anyone calling foo will have to import Data.Maybe so that IsJust is available to them. Instead, we simply rely on boolean equality, and convert to bools within the somecall function. This probably causes more evaluations, but is much more ergonomic for the users I think.

Additionally, evaluation seems to be a bit of a problem when we have as many options as we do (I've used grep/sort/uniq to extract them). Firstly, there were a few options that simply wouldn't resolve to either True or False, and an experiment with my "older" sketch (see https://gist.github.com/AdamHarries/33c54afc57f735884ff31dfa6124ae4b) hangs Idris2 by causing it to go into an infinite loop. I think this may be an issue with statically constrained logging as we have a lot of existing options, unless we re-think or re-design logging in general.

gallais commented 3 years ago

Building the trie at compile time may be more expensive than I assumed. And it may be that we don't have enough caching of the normal form of toplevel values and that the trie is being rebuilt every time we try to check for membership?

How bad is it if we work directly on the assoc list of (String, String)? (Also, I don't think we need to public export any of the logX)

stefan-hoeck commented 3 years ago

I had a look at this. If instead of using a StringTrie we just use a list of pairs and lookup the whole log name there, it typechecks instantaneously. For pretty-printing the tree of log levels we could still go via some other data structure if necessary, but for compiletime safety (= not using an undefined loglevel) the list should be enough (gist: https://gist.github.com/stefan-hoeck/477ec6dc5e7344a14d01f522a94496f9).

stefan-hoeck commented 3 years ago

@AdamHarries are you still working on this? Otherwise I could continue here, in order to get a complete list of log topics for #1397.

AdamBrouwersHarries commented 3 years ago

Hi @stefan-hoeck! Please feel free to take over this issue from me - I sadly don't have as much time for OSS work as I did when I initially picked it up, so I won't be working on it any time soon.

stefan-hoeck commented 3 years ago

Thanks for your quick reply! I'll give this a try then.

idris-lang / Idris2

Statically enforce that all the logging options are documented #1124