Integrate Sigmoid-Based Reward Calculation into get_rewards Function

sakrobinson commented 9 months ago

This PR introduces chagnes to the get_rewards function and supporting fnx within the reward_funcs.py module. By integrating a sigmoid-based reward calculation, we replace the previous linear reward scaling with a more nuanced approach that better reflects the relative performance of miners.

Key changes: The addition of a vectorized sigmoid function that allows for adjustable steepness and midpoint shift through temperature and shift parameters, using the torch lib.

Modification of the get_rewards function to utilize the sigmoid function for calculating rewards based on the inverted normalized process times. (Faster is better). The rewards are also scaled such that the highest value is set to 1, providing a clear maximum reward and ensuring a consistent reward range across different sets of responses. Alternative scaling using the TF lib is available.

This update aims to provide a more merit-based reward distribution, taking into account both the accuracy and speed of responses in a non-linear fashion. The sigmoid function's flexibility allows for fine-tuning the reward curve to align with desired incentive structures.

Testing: We may need to adjust the shift parameter based on miner clock times to get the full "S" shape as desired. Assumes clock times are between .1 and .001 now.

sakrobinson commented 9 months ago

Stand-alone script for testing params:

import torch
import torch.nn.functional as TF
import matplotlib.pyplot as plt

def sigmoid(
    x,
    temperature=1.0,
    shift=0.0,
):
    # sigmoid with params
    return 1. / (1. + torch.exp(-temperature * (x + shift)))

def sigmoid_rewards(
    result_accuracies,
    process_times,
    temperature=10.,  # Controls the steepness of the sigmoid curve
    shift=-0.5,        # Shifts the curve left or right
    post_norm_or_max="max",
):
    if not isinstance(process_times, torch.Tensor):
        process_times = torch.tensor(process_times, dtype=torch.float32)
    if not isinstance(result_accuracies, torch.Tensor):
        result_accuracies = torch.tensor(result_accuracies, dtype=torch.float32)

    # Normalize process times inversely so that lower times are better
    normalized_process_times = (process_times - torch.min(process_times)) / (torch.max(process_times) - torch.min(process_times))
    inverted_process_times = 1. - normalized_process_times  # Invert so higher times have lower scores
    #breakpoint()

    # Apply the vectorized sigmoid function to the inverted normalized process times
    sigmoid_process_times = sigmoid(inverted_process_times, temperature, shift)

    # Weight the accuracy and speed, multiplying by result_accuracy to handle 0 accuracy case mathematically
    rewards = result_accuracies * sigmoid_process_times
    if post_norm_or_max == "max":
        rn = rewards / torch.max(rewards)
    else:
        rn = TF.normalize(rewards, dim=0)
    #breakpoint()
    return rn

def main():
    scores = [1.0]   # Assuming a constant accuracy of 1 for all process times
    proc_times = torch.arange(0.001, 0.501, 0.001)
    rewards = compute_rewards(
        scores,
        proc_times,
    )
    #plt.plot(proc_times, rewards.numpy(), marker='o')
    #plt.xlabel('Process Times')
    #plt.ylabel('Normalized Rewards')
    #plt.title('Reward Distribution Across Process Times using Sigmoid')
    #plt.grid(True)
    #plt.show()

if __name__ == "__main__":
    main()

vn-automata / bt-automata

Integrate Sigmoid-Based Reward Calculation into get_rewards Function #31