Proper benchmarking for near-vm-runner

We need to know when our changes cause smart contract to run slower or faster.

We need to switch to criterion for the following reasons:

Criterion has warm-up option which is really important for running with Wasmer since we want to disregard the time it takes for it to warm-up;
Criterion uses multiple iterations (it runs the same command multiple times and then divides the time spent by the number of runs). It is very helpful in discarding further warm-up effects;
It can compare its execution result to the previous execution and tell when the change is statistically significant based on p-values, which is useful for decision making.
It can be used to compare code that uses different configurations (e.g. disabled/enabled gas).

near / nearcore