Don't give points for perf tests when failing a previous test

Elfikurr commented 3 years ago

Example : if we fail the first perf test, and succeed the third one, we will have more points than another submission failing all the perf tests.

It is not clear for the user as the tests are shown as "Test skipped".

FranckCHAMBON commented 3 years ago

My 2¢ about that. Once correctness tests OK, one should be rewarded an amount of points proportional to the number of perf tests.

Why ?

perf tests are maybe not set with increasing difficulty, but maybe are different by nature. An algo should earn points for each perf test, regardless its rank. To be honest, most of the time one can rank the difficulties of input files. I would say that sometimes one can make an unranked list of input files ! I can imagine a general hard factorization problem with some hard cases, regrouped by kind in input files. How to rank those files ?!?

Why not ? (hint given by Elfikurr in PM)

a too slow algo may skip computation just before the time limit and output a random answer, and may get points. I do believe a good output file isn't so short that it is feasible to jackpot it by luck. IMHO, an output file should be large enough to avoid every such issue with random output. 32 bits as significant output should be enough to avoid almost all luck, even 16 bits.

Best regards, and many thanks for the interest for such a detail. At start, it was a misunderstanding with points and "Test skipped".

seirl commented 3 years ago

The idea is to rank people according to how far they are able to go in the test suite, not to give points proportional to the number of tests passed.

If the output is either 0 or 1, and the contestant does a return random(0, 1) we don't want half the tests to be counted as passing. People should demonstrate consistency in the way they pass tests, not just pass a handful of them that are somehow convenient to pass.

FranckCHAMBON commented 3 years ago

If the output is either 0 or 1, and the contestant does a return random(0, 1)

I do believe that a problem shouldn't be with so few output. One bit is too few, I think 16 or 32 significant bits should be the minimum for an output. In that case, the order of test cases can be irrelevant. This could be useful with test files of very different kind and no obvious order of difficulty.

It is just an idea.

seirl commented 3 years ago

I do believe that a problem shouldn't be with so few output.

Well, there's no good reason for us to just give up an entire class of problems, so that assumption is wrong.

Elfikurr commented 3 years ago

We could imagine a problem that needs to print 0 or 1, to say that an action is possible, let's say : "can you get out of this maze". Then the output would always be very small, but the time the algorithm can take would scale up according to the input. And one could pass some performance tests that one should not be able to pass !

However, I believe this doesn't change much for contestants : usually the algorithm is either slow (brute forcing something for instance), or smart ; if the algorithm is correct, then all the performance tests would fail for the slow algorithm, and usually all the performance tests pass when implementing a better algorithm.

I think however, for clarity's sake, we should stop at the first failed test (the "Test skipped..." being misleading). It would also be consistent to keep it like this, because the correctness tests work like this too.

Nhqml commented 3 years ago

IMO we should consider two cases for failed performance tests:

Fail caused by a timeout In this case, the points are given for the successful previous performance tests.
Fail caused by a wrong result In this case, no point is given for the performance tests (even the ones that were successful). I believe that if an algorithm is incorrect it should not have the "privilege" to earn any performance point.

juli0z commented 2 years ago

Migrated to https://gitlab.com/prologin/concours/site/-/issues/304

prologin / concours-site

Don't give points for perf tests when failing a previous test #304