patrickloeber / snake-ai-pytorch

MIT License
600 stars 388 forks source link

training is slow, why? #10

Open dracularking opened 1 year ago

dracularking commented 1 year ago

Game 484 Score 11 Record: 69

after so many failures, only got a record 69, any way to improve it? or is it possible to achieve better?

Charwisc-py commented 1 year ago

increase tick speed so the games go faster

sanjaybora04 commented 1 year ago

@dracularking include position of sname in inputs and use lstm

ethicalhacker7192 commented 7 months ago

or turn the tick speed to 0, sure it will be hard on your CPU but it is worth it for results under 10 minutes, also tick speed 0 is for some reason faster than 10000000000000000000000000, just the way python works sometimes.

ethicalhacker7192 commented 7 months ago

I've also improved it by adding a additional layer in the neural network, the improved code here:

self.Linear1 = nn.Linear(input_size, hidden_size)
self.Linear2 = nn.Linear(hidden_size, hidden_size)
self.Linear3 = nn.Linear(hidden_size, output_size) # NOTE: HAS TO BE WITHIN THE "model.py" FILE WITHIN THE SUPER INIT.

for the forward() function yes you do have to do everything manually, here is the code:

def forward(self, x):
    x = F.relu(self.Linear1(x))
    x = F.relu(self.Linear2(x))
    x = self.Linear3(x) # Output layer, no ReLU here
    return x

these are the parts of the file you need to edit to get a better model, do not add too many layers or overfitting may occur.

ethicalhacker7192 commented 7 months ago

the entire code would be like such:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import os

class Linear_QNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size, hidden_size)
        self.linear3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x)) # ReLU activations introduce non-linearity.
        x = self.linear3(x)
        return x

    def save(self, file_name='model.pth'):
        model_folder_path = './model'
        if not os.path.exists(model_folder_path):
            os.makedirs(model_folder_path)

        file_name = os.path.join(model_folder_path, file_name)
        torch.save(self.state_dict(), file_name)

class QTrainer:
    def __init__(self, model, lr, gamma):
        self.lr = lr
        self.gamma = gamma
        self.model = model
        self.optimizer = optim.Adam(model.parameters(), lr=self.lr)
        self.criterion = nn.MSELoss()

    def train_step(self, state, action, reward, next_state, done):
        state = torch.tensor(state, dtype=torch.float)
        next_state = torch.tensor(next_state, dtype=torch.float)
        action = torch.tensor(action, dtype=torch.long)
        reward = torch.tensor(reward, dtype=torch.float)
        # (n, x)

        if len(state.shape) == 1:
            # (1, x)
            state = torch.unsqueeze(state, 0)
            next_state = torch.unsqueeze(next_state, 0)
            action = torch.unsqueeze(action, 0)
            reward = torch.unsqueeze(reward, 0)
            done = (done, )

        # 1: predicted Q values with current state
        pred = self.model(state)

        target = pred.clone()
        for idx in range(len(done)):
            Q_new = reward[idx]
            if not done[idx]:
                Q_new = reward[idx] + self.gamma * torch.max(self.model(next_state[idx]))

            target[idx][torch.argmax(action[idx]).item()] = Q_new

        # 2: Q_new = r + y * max(next_predicted Q value) -> only do this if not done
        # pred.clone()
        # preds[argmax(action)] = Q_new
        self.optimizer.zero_grad()
        loss = self.criterion(target, pred)
        loss.backward()

        self.optimizer.step()

You can copy and paste this code into model.py

ethicalhacker7192 commented 6 months ago

The best way to improve the Snake AI however, is the following code:

model.to('cuda:0')

add this code right after the definition of 'model(state)'.

NOTE: You must have a cuda compatible GPU to use this method.

JGsouzaa commented 4 months ago

Probably you need to change a little bit the logic and the weights to work better in the long run, increase the layers or neurons in the model probably isn't the better choice, I implemented the code and notice that even the AI goes well arround 60-200 games it keeps making same mistakes as 1-10 games such looping in a corner.

I didn't improve the NN, I rebalance the weights and the aleatority param to make sure IA keeps improve and change the logic to make this kind of early issues didn't affect anymore in the long run.

The purpouse of this exercises is to introduce to these complex algorithms and techonlogies so we can search and find out studying and practicing, so I suggest you guys to revise the basics of ML and NN so you can understand better the unsupervisioned learning concepts and then by yourselves improve this code because it's very simple and wasn't made to give us the fish, but to incentive us to learn how to fish

ethicalhacker7192 commented 4 months ago

Thanks, looking back at my code, it seems I should have been more focused on the rewarding system, I am seeing if I can do what is called a genetic algorithm that adjusts weights subtly, and perhaps changes the rewards very gradually, and it uses natural selection, making sure at least 5 original Worm workers will be saved and refered to in the future.