Open Armael opened 5 years ago
I agree to the design of the "checks" field, also I like the allow_sorry-agnostic idea.
We need to model failing submissions! Examples are the following:
maybe we need an extra field (e.g. "result") which is either "passed", or "failed X", with X being either "Ill-formated (line number)", "Timeout", "Internal-Error"
then we need a set of informative Error Messages that can be displayed to the users/contestants.
Indeed we should add something to account for submissions that fail to even parse/typecheck/etc. As a first step, I think the simplest option is to have a result field as you suggest, and leave all the error reporting messages in the log. The json format would then be:
{ "result" : "ok"/"error",
"checks" : [
{ "name" : "lemma1", "result" : "ok" },
{ "name" : "lemma2", "result" : "ok_with_axioms" },
{ "name" : "lemma3", "result" : "error" },
...
],
"log" : "<grader log here (including possible error messages)>"
}
I think this would be already good to have, and then we can think of how to structure the error messages a bit more. But already having them displayed to the user should do most of the job.
for the error I propose
{ "result" : "ok"/"error",
"checks" : [ ... ],
"log" : "<grader log here (including possible error messages)>",
"error" : [ { "where": "Check.thy" , "what": "syntax error" } , ... ],
}
Mhh, with respect to the Coq backend, it would be easier to have each check include a message; and then the global log would include the remaining messages for the global submission (e.g. it couldn't compile/parse error/ -- in which case no checks can be run).
{ "result" : "ok"/"error",
"checks" : [
{ "name" : "lemma1", "result" : "ok", "message" : "OK" },
{ "name" : "lemma2", "result" : "ok_with_axioms", "message" : "Uses forbidden axiom foobar" },
{ "name" : "lemma3", "result" : "error", "message" : "type mismatch ..." },
...
],
"log" : "<grader log here (error messages here are for the global submission, eg parse error)>"
}
Would that work for the isabelle backend?
So Armael and me discussed the format and came to the following conclusion. The idea is to do as much as possible of the logic in the backend and only present polished lists of pairs to the frontend, which then can have a uniform display routine.
{ "submission_is_valid" : True/False,
"checks" : [
{ "name" : "lemma1", "result" : "ok" },
{ "name" : "lemma2", "result" : "ok_with_axioms" },
{ "name" : "lemma3", "result" : "error" },
...
],
"messages" : [ { "where": "Check.thy" , "what": "syntax error" } , ... ],
"log" : "anything"
}
possible values for "submission_is_valid": True/False
possible values for the "result" in "checks": {"ok", "ok_with_axioms", "error"}
the idea is, that the grading process should be independent from whether or not we allow sorry for a task. Only the frontend then decides (with the allow_sorry flag) how to compute (partial) points. In that way, one can also later change the flag.
-> issue: can we then remove passing on the allow_sorry
flag from the backends?
"log" can contain anything, maybe we display the full string as a toggle at the end of the submission display
We should decide on a format for replies from a grader to the frontend that is a bit more structured than the current pair
{0,1} × string
. In particular it should have support for partial successes (typically, when someone proved some of the lemmas and left some unfinished, we want to display "Partial success 3/5" instead of just "Error"). As far as I understand, this is currently implemented in the Isabelle case but in an ad-hoc way.Here's a first proposal. The grader output would be a json dictionnary of the following format:
The checks field should list the checks run by the grader (typically, there could be one check per lemma; but if a grader backend does not support fine grained checking, it would simply return a list with a single check element for the whole document). For each check the result is either "ok" (everything went fine), "ok_with_axioms" (the check passes but the proof uses unexpected axioms / has been admitted), or "error" (typechecking error or whatnot).
To keep things simple, all the error reporting and grader messages simply go in the free form "log" field which should simply be displayed to the user by the frontend.
Wrt partial successes and allow_sorry, this format has the nice property that allow_sorry does not need to be passed to the grader! The frontend can decide in a prover-agnostic way how to interpret the "ok_with_axioms" results: failure if allow_sorry is false, and I imagine partial success otherwise.
What do you think?