Advanced cheat busting - Githubissues

proegssilb commented 9 months ago

Advanced techniques that have complexity to implement:

Unload the code periodically while benchmarking. This slows the process down, but also forces the code to eat some of the cost of how it computes the solution (vs. re-using solutions cached at runtime, ie during warmup).
The first input is always the same, cycle through the rest later.

Additions:

Exit hook with uncommon/unique "trigger phrase" to detect early/unusual exits from the benchmarker (mitigate user code trying to exit early to fake text output).

proegssilb commented 9 months ago

Got any ideas for cool tricks? Add them here, let's see what's reasonable.

dev-ardi commented 9 months ago

Unload the code periodically while benchmarking. This slows the process down, but also forces the code to eat some of the cost of how it computes the solution.

Can you explain this a bit?

proegssilb commented 9 months ago

Not sure Rust can do this, but C# (and some other languages) can load code into a memory container, and then delete the entire block of memory, unloading code & associated data all in one go. Some approaches to cheating the benchmark involve caching data in memory dynamically. By deleting the cached data every ~1000 runs, the cheat doesn't pay off nearly as well. However, you may need another warm-up period after loading the code again in order to get the hardware back up to speed before measuring another ~1000 runs.

dev-ardi commented 9 months ago

Huh that's a smart way of cheating though. One can theoretically put all of the cached computations into a global hashmap, making all iterations almost instant. Doing this would make a pretty big outlier though. I don't think we should try to mitigate these kinds of problems, even if we could.

proegssilb commented 9 months ago

On one hand, the issue of complexity is precisely why this issue is in the 1.1+ milestone, and is a good argument for not mitigating the issue. On the other hand, it's already been done with the Old Bot being up <24 hours, which makes mitigating it a thing to keep on the to-do list.

I don't have any answers here. Maybe we shouldn't try to mitigate that specific cheat. Maybe we should. But I do know we shouldn't try it right now, and maybe not during 2023..

dev-ardi commented 9 months ago

I think we can go with the old reliable moderation. People will notice pretty fast (8ns/iter is very sus)

proegssilb commented 9 months ago

There's a trick involving delaying computation until Display is evaluated. Not sure there's a good bust for that, other than allowlisting specific primitive types that can be returned, which is a bit much.

EDIT: Suggestion from Discord:

impl Display + 'static i think that solves that, since then you cant store the input in the impl Display at least not without copying it though, for particularly computation heavy days cloning the input can be negligible (edited) we could include writing the output to a fixed buffer as part of the benchmark though everyones times would take a hit but it closes this hole and since everyone is affected nobody is and we could even "write" to a null buffer, though idk where we can black_box these things

I'll add that we could conceivably come up with a way to replace stdout/stderr with /dev/null, but if we write the solution to a stream, we probably want to write the returned solution directly to /dev/null before stopping the timer (open /dev/null for write/append, dump to it thousands of times, then close it).

ultrabear commented 8 months ago

We should ban statics entirely they can trivially be used for caching and effectively only act as a global state for normal use: anyone submitting code should be able to refactor to pass that state more explicitly

proegssilb / ferris-elf

Advanced cheat busting #27