Structured format for grader replies

Armael commented 5 years ago

We should decide on a format for replies from a grader to the frontend that is a bit more structured than the current pair {0,1} × string. In particular it should have support for partial successes (typically, when someone proved some of the lemmas and left some unfinished, we want to display "Partial success 3/5" instead of just "Error"). As far as I understand, this is currently implemented in the Isabelle case but in an ad-hoc way.

Here's a first proposal. The grader output would be a json dictionnary of the following format:

{ "checks" : [
    { "name" : "lemma1", "result" : "ok" },
    { "name" : "lemma2", "result" : "ok_with_axioms" },
    { "name" : "lemma3", "result" : "error" },
    ...
  ],
  "log" : "<grader log here (including possible error messages)>"
}

The checks field should list the checks run by the grader (typically, there could be one check per lemma; but if a grader backend does not support fine grained checking, it would simply return a list with a single check element for the whole document). For each check the result is either "ok" (everything went fine), "ok_with_axioms" (the check passes but the proof uses unexpected axioms / has been admitted), or "error" (typechecking error or whatnot).

To keep things simple, all the error reporting and grader messages simply go in the free form "log" field which should simply be displayed to the user by the frontend.

Wrt partial successes and allow_sorry, this format has the nice property that allow_sorry does not need to be passed to the grader! The frontend can decide in a prover-agnostic way how to interpret the "ok_with_axioms" results: failure if allow_sorry is false, and I imagine partial success otherwise.

What do you think?

maxhaslbeck commented 5 years ago

I agree to the design of the "checks" field, also I like the allow_sorry-agnostic idea.

We need to model failing submissions! Examples are the following:

ill-formated submissions -> lead to an error during proof checking,
some proof tactics takes too long or loops, [in order to avoid DoS attacks] we need to have some timeout on the proof checking phase
Internal Error (something strange happend, conflicting versions?, ...)
others ...?

maybe we need an extra field (e.g. "result") which is either "passed", or "failed X", with X being either "Ill-formated (line number)", "Timeout", "Internal-Error"

then we need a set of informative Error Messages that can be displayed to the users/contestants.

Armael commented 5 years ago

Indeed we should add something to account for submissions that fail to even parse/typecheck/etc. As a first step, I think the simplest option is to have a result field as you suggest, and leave all the error reporting messages in the log. The json format would then be:

{ "result" : "ok"/"error", 
  "checks" : [
    { "name" : "lemma1", "result" : "ok" },
    { "name" : "lemma2", "result" : "ok_with_axioms" },
    { "name" : "lemma3", "result" : "error" },
    ...
  ],
  "log" : "<grader log here (including possible error messages)>"
}

I think this would be already good to have, and then we can think of how to structure the error messages a bit more. But already having them displayed to the user should do most of the job.

maxhaslbeck commented 5 years ago

for the error I propose

{ "result" : "ok"/"error", 
  "checks" : [  ... ],
  "log" : "<grader log here (including possible error messages)>",
  "error" : [  { "where": "Check.thy" , "what": "syntax error" } , ... ],
}

Armael commented 5 years ago

Mhh, with respect to the Coq backend, it would be easier to have each check include a message; and then the global log would include the remaining messages for the global submission (e.g. it couldn't compile/parse error/ -- in which case no checks can be run).

{ "result" : "ok"/"error", 
  "checks" : [
    { "name" : "lemma1", "result" : "ok", "message" : "OK" },
    { "name" : "lemma2", "result" : "ok_with_axioms", "message" : "Uses forbidden axiom foobar" },
    { "name" : "lemma3", "result" : "error", "message" : "type mismatch ..." },
    ...
  ],
  "log" : "<grader log here (error messages here are for the global submission, eg parse error)>"
}

Would that work for the isabelle backend?

maxhaslbeck commented 5 years ago

So Armael and me discussed the format and came to the following conclusion. The idea is to do as much as possible of the logic in the backend and only present polished lists of pairs to the frontend, which then can have a uniform display routine.

{ "submission_is_valid" : True/False, 
  "checks" : [
    { "name" : "lemma1", "result" : "ok" },
    { "name" : "lemma2", "result" : "ok_with_axioms" },
    { "name" : "lemma3", "result" : "error" },
    ...
  ],
    "messages" : [  { "where": "Check.thy" , "what": "syntax error" } , ... ],
  "log" : "anything"
}

possible values for "submission_is_valid": True/False

True signals, that checking the submission went well
False signals, that checking the submission failed, then there should be some error messages in the "messages" list

possible values for the "result" in "checks": {"ok", "ok_with_axioms", "error"}

"ok" says, the lemma was proven without an oracle
"ok_with_axioms" says, that the lemma was proven but used an oracle
"error" says, there was an error when checking for that specific "check"

the idea is, that the grading process should be independent from whether or not we allow sorry for a task. Only the frontend then decides (with the allow_sorry flag) how to compute (partial) points. In that way, one can also later change the flag. -> issue: can we then remove passing on the allow_sorry flag from the backends?

"log" can contain anything, maybe we display the full string as a toggle at the end of the submission display

isabelle-prover / proving-contest-backends

Structured format for grader replies #8