Closed sakrobinson closed 9 months ago
Stand-alone script for testing params:
import torch
import torch.nn.functional as TF
import matplotlib.pyplot as plt
def sigmoid(
x,
temperature=1.0,
shift=0.0,
):
# sigmoid with params
return 1. / (1. + torch.exp(-temperature * (x + shift)))
def sigmoid_rewards(
result_accuracies,
process_times,
temperature=10., # Controls the steepness of the sigmoid curve
shift=-0.5, # Shifts the curve left or right
post_norm_or_max="max",
):
if not isinstance(process_times, torch.Tensor):
process_times = torch.tensor(process_times, dtype=torch.float32)
if not isinstance(result_accuracies, torch.Tensor):
result_accuracies = torch.tensor(result_accuracies, dtype=torch.float32)
# Normalize process times inversely so that lower times are better
normalized_process_times = (process_times - torch.min(process_times)) / (torch.max(process_times) - torch.min(process_times))
inverted_process_times = 1. - normalized_process_times # Invert so higher times have lower scores
#breakpoint()
# Apply the vectorized sigmoid function to the inverted normalized process times
sigmoid_process_times = sigmoid(inverted_process_times, temperature, shift)
# Weight the accuracy and speed, multiplying by result_accuracy to handle 0 accuracy case mathematically
rewards = result_accuracies * sigmoid_process_times
if post_norm_or_max == "max":
rn = rewards / torch.max(rewards)
else:
rn = TF.normalize(rewards, dim=0)
#breakpoint()
return rn
def main():
scores = [1.0] # Assuming a constant accuracy of 1 for all process times
proc_times = torch.arange(0.001, 0.501, 0.001)
rewards = compute_rewards(
scores,
proc_times,
)
#plt.plot(proc_times, rewards.numpy(), marker='o')
#plt.xlabel('Process Times')
#plt.ylabel('Normalized Rewards')
#plt.title('Reward Distribution Across Process Times using Sigmoid')
#plt.grid(True)
#plt.show()
if __name__ == "__main__":
main()
This PR introduces chagnes to the get_rewards function and supporting fnx within the reward_funcs.py module. By integrating a sigmoid-based reward calculation, we replace the previous linear reward scaling with a more nuanced approach that better reflects the relative performance of miners.
Key changes: The addition of a vectorized sigmoid function that allows for adjustable steepness and midpoint shift through temperature and shift parameters, using the torch lib.
Modification of the get_rewards function to utilize the sigmoid function for calculating rewards based on the inverted normalized process times. (Faster is better). The rewards are also scaled such that the highest value is set to 1, providing a clear maximum reward and ensuring a consistent reward range across different sets of responses. Alternative scaling using the TF lib is available.
This update aims to provide a more merit-based reward distribution, taking into account both the accuracy and speed of responses in a non-linear fashion. The sigmoid function's flexibility allows for fine-tuning the reward curve to align with desired incentive structures.
Testing: We may need to adjust the shift parameter based on miner clock times to get the full "S" shape as desired. Assumes clock times are between .1 and .001 now.