tune hyperparameters for RLHF model

GrantorShadow commented 1 year ago

Increase the training iterations: Train the PPO model for more iterations, as the model might not have converged yet.

Adjust the PPO hyperparameters: Experiment with different hyperparameters such as learning rate, batch size, and discount factor. refer to the Stable Baselines documentation for more details.

Modify the MLP architecture: Adjust the number of layers and neurons in the MLP, as well as the activation functions, to improve the model's capacity to learn complex patterns.

Experiment with other algorithms: There are several other reinforcement learning algorithms available in the Stable Baselines library, such as DDPG, SAC, and A2C. experiment with these.

smritae01 commented 1 year ago

Obtained 59.3% accuracy with 3 iterations

GrantorShadow commented 1 year ago

can we try other algos too ? like DDPG and SAC and A2C? @smritae01

smritae01 / CS640-Originality-Score-Project

tune hyperparameters for RLHF model #4