tig-foundation / tig-monorepo

TIG is the first coordination protocol designed specifically for algorithmic innovation
https://tig.foundation
15 stars 28 forks source link

Improved Randomness and Seed Generation with Keccak256 Hashing and Xoshiro256PlusPlus #14

Open vstam1 opened 3 weeks ago

vstam1 commented 3 weeks ago

This PR introduces enhancements to the seed and randomness generation and a minor optimization to the knapsack algorithm. The key changes include consolidating seed generation to a single Keccak256 hash, updating the randomness source to use Xoshiro256PlusPlus for better performance, and applying a small sorting optimization in the knapsack logic.

Details:

Seed Generation Update:

Previous Implementation: The system generated 8 u64s from a Keccak512 hash of the BenchmarkSettings object. Then these u64s were all xored with the supplied nonce to generate 8 seeds. The seeds were provided to the RngArray struct that cycles through 8 StdRng instances, each seeded with one of the seeds.

This approach resulted in a general efficiency drop in the random number generation for each challenge. This efficiency drop is mostly noticeable in the vector search challenge.

New Implementation: Seed generation has been streamlined to a single seed generated using a Keccak256 hash. This change reduces complexity and improves performance. The seed is generated based on the Keccak256 hash of a string constructed of the BenchmarkSettings object + the nonce. This string has the following format input:{json(BenchmarkSettings)}nonce:{nonce}. This format has been chosen to delimit the settings string from the nonce. This prevents the creation of two setting + nonce combinations that result in the same string. As Keccak256 is collision and preimage resistant and a single bit change in the string results in a completely different seed, this seed generation approach should be safe.

Randomness Source Change:

Previous Implementation: As described in the previous part, the randomness source is a custom RngArray implementation that is based on 8 StdRng sources with different u64 seeds. StdRng is a cryptographically secure PRNG. This might be overkill as the cryptographic security prevents the predictability of the next values based on previous values. This feature is important for cryptographic protocols and gambling applications, but (to my knowledge) not as important for the random instance generation of tig challenges. As each instance is generated from a different seed, an attacker would not gain much knowledge from previously generated numbers. The most important requirement is good statistical properties of the PRNG.

New Implementation: The RngArray implementation is replaced by a new PRNG, named Xoshiro256PlusPlus (see the CSPRNGs and PRNG offered by rand team). PRNGs compared to CSPRNG have a sizable performance improvement and should be considered for the Tig project.

Performance (Macbook m2 max)

Satisfiability Vehicle Routing Knapsack Vector Search
Diff 12000, 400 100, 300 80, 18 250, 500
Current Impl 8.56ms 200.03µs 21.64µs 548.19ms
Single Seed + StdRng 3.45ms 138.08µs 7.95µs 123.16ms
Single Seed + Xoshiro256PlusPlus 2.35ms 81.18µs 6.51µs 44.67ms
Knapsack unstable_sort 4.94µs
Performance Gain 3.64X 2.46X 4.38X 12.27X

Performance Test Code

let n = 100; 
let settings = BenchmarkSettings{
    player_id: "".to_string(),
    block_id: "".to_string(),
    challenge_id: "".to_string(),
    algorithm_id: "".to_string(),
    difficulty: vec![12000, 400]
};

let now = Instant::now();
for i in 0..n {
    let seed = settings.calc_seed(i);
    let _ = tig_challenges::c001::Challenge::generate_instance_from_vec(seed, &settings.difficulty);
}
let elapsed = now.elapsed();
println!("{:.2?}", elapsed / n as u32);

Conclusion

This pr introduces a new way to generate the seed + random values for the Tig challenge instances. It has significant performance improvements while meeting security requirements. The most important performance increase is the 12.27X gain for Vector Search. The problem generation for Vector Search has a large influence on the total number of problems a benchmarker can evaluate. For the other problems, like the vehicle routing problem, this performance gain will not significantly impact the total problem evaluation as the current algorithms use magnitudes higher runtime than the problem generation.

FiveMovesAhead commented 2 weeks ago

Whilst I agree it was overkill to invoke the PRNG 2 times per generation (which will be reduced to 1 in the next update), I am not convinced of the suggestion to move to Xoshiro256PlusPlus. The choice of StdRng has performance hits, but it was deemed an acceptable comprimise for its security/complexity or rather its stronger resilience to hardware optimisations (such as GPU).

This is resilience is a stop-gap whilst we work towards the major mining pool update.

Taking a step back, in regards to instance generation time vs algorithm solving time, it is better dealt with via our other planned update to runtime_signature which allows us to significantly increase the minimum difficulties. As for the dominance of greedy algorithms with knapsack challenge, that can be dealt with separately.

vstam1 commented 2 weeks ago

Alright, I made my commits so it's easy to cherry-pick the changes you think are useful. The single-seed implementation already has the most performance improvements, so it would already be useful to implement only this change.

The choice of StdRng has performance hits, but it was deemed an acceptable compromise for its security/complexity or rather its stronger resilience to hardware optimizations (such as GPU).

What I don't completely understand are the Tig bounty comments for GPU instance generation. If we want to make GPU optimizations for instance generation, however, random number generation with GPU should remain hard, how should we optimize for GPU?

FiveMovesAhead commented 2 weeks ago

The support for GPU optimisations is to make it as even as possible a playing field whilst we worked towards the mining pool update. The same goes for the comment about bounties.

fatigue24 commented 2 weeks ago

If we're not going to use Xoshiro256(aka SmallRng), then we'd better consider using chacha8 instead of chacha12. It is still crypto-secure and gives a huge performance boost when generating CPU challenge instances

fatigue24 commented 2 weeks ago

The choice of StdRng has performance hits, but it was deemed an acceptable comprimise for its security/complexity or rather its stronger resilience to hardware optimisations (such as GPU).

Removing randomness when selecting rng inside RngArray simplifies parallelization on the GPU, so there is not a huge difference between seeding Xoshiro256 with 256 bits and having one internal state or having a sequence of chacha12 internal states since there is no more divergence inside the GPU warp.

EDIT: As a potential improvement, we can consider adding an offset based on nonce when sampling rng inside RngArray, so that when designing CUDA optimizations, we cannot move internal states into registers due to divergence and have to use global memory to store them.