GNN-based Meta-Learning for Sparse Portfolio Optimization

kayuksel commented 1 year ago

Hello,

Let me start by saying that I am a fan of your work here. I have recently open-sourced by GNN-based meta-learning method for optimization. I have applied it to the sparse index-tracking problem from real-world (after an initial benchmarking on Schwefel function), and it seems to outperform Fast CMA-ES significantly both in terms of producing robust solutions on the blind test set and also in terms of time (total duration and iterations) and space complexity. I include the link to my repository here, in case you would consider adding the method or the benchmarking problem to your repository. Note: GNN, which learns how to generate populations of solutions at each iteration, is trained using gradients of the loss function, as opposed to black-box algorithms here.

Sincerely, K

NaturalGradient commented 1 year ago

Hi @kayuksel! I saw your post on Reddit I think, very cool stuff :) we'd always love to integrate new techniques into EvoTorch, so if you have time, maybe you'd consider reformatting your code to work in the EvoTorch searchalgorithm API and open a pull request? Otherwise I'll see if I can get to doing that myself when I find time.

kayuksel commented 1 year ago

Thank you very much, Timothy. Would it help you if I provided a highly simplified version (~75 lines) as below? FYI, I also created a game-theoretic or adversarial version of the generative model for global optimization that utilizes a positive surprise mechanism obtained through a surrogate model (critic) trained simultaneously with Adaptive Gradient Clipping. It is quite competitive against the best optimizers in Nevergrad in large-scale non-convex optimization problems, especially with noisy rewards. Note: The only bottleneck seems to be the random seed sensitivity, for which GradInit (arxiv.org/abs/2102.08098) seems to be a solution.

import torch
import torch.nn as nn

def schwefel(x):
    x = x * 500
    return 418.9829 * x.shape[1] - (x * x.abs().sqrt().sin()).sum(dim=1)

def init_weights(model):
    for m in model.modules():
        if isinstance(m, nn.BatchNorm1d):
            m.weight.data.fill_(1)
        elif isinstance(m, nn.Linear):
            nn.init.xavier_uniform_(m.weight, gain = 5/3)
        if hasattr(m, 'bias') and m.bias is not None: m.bias.data.zero_()

class LSTMModule(nn.Module):
    def __init__(self, input_size = 1, hidden_size = 1, num_layers = 2):
        super(LSTMModule, self).__init__()
        self.rnn = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.h = torch.zeros(num_layers, 1, hidden_size, requires_grad=True).cuda()
        self.c = torch.zeros(num_layers, 1, hidden_size, requires_grad=True).cuda()
    def forward(self, x):
        self.rnn.flatten_parameters()
        out, (h_end, c_end) = self.rnn(x, (self.h, self.c))
        self.h.data = h_end.data
        self.c.data = c_end.data
        return out[:,-1, :].flatten()

class Extractor(nn.Module):
    def __init__(self, latent_dim, ks = 5):
        super(Extractor, self).__init__()
        self.conv = nn.Conv1d(args.noise, latent_dim,
            bias = False, kernel_size = ks, padding = (ks // 2) + 1)
        self.conv.weight.data.normal_(0, 0.01)
        self.activation = nn.Sequential(nn.BatchNorm1d(
            latent_dim, track_running_stats = False), nn.Mish())
        self.gap = nn.AvgPool1d(kernel_size = args.batch, padding = 1)
        self.rnn = LSTMModule(hidden_size = latent_dim)
    def forward(self, x):
        y = x.unsqueeze(0).permute(0, 2, 1)
        y = self.rnn(self.gap(self.activation(self.conv(y))))
        return torch.cat([x, y.repeat(args.batch, 1)], dim = 1)

class Generator(nn.Module):
    def __init__(self, noise_dim = 0):
        super(Generator, self).__init__()
        def block(in_feat, out_feat):
            return [nn.Linear(in_feat, out_feat), nn.Tanh()]
        self.model = nn.Sequential(
            *block(noise_dim+args.cnndim, 480), *block(480, 1103), nn.Linear(1103, args.funcd))
        init_weights(self)
        self.extract = Extractor(args.cnndim)
        self.std_weight = nn.Parameter(torch.zeros(args.funcd).cuda())
    def forward(self, x):
        mu = self.model(self.extract(x))
        return mu + (self.std_weight * torch.randn_like(mu))

actor = Generator(args.noise).cuda()
opt = torch.optim.AdamW(filter(lambda p: p.requires_grad, actor.parameters()), lr=1e-3)
best_reward = None
for epoch in range(args.iter):
    torch.cuda.empty_cache()
    opt.zero_grad()
    z = torch.randn((args.batch, args.noise)).cuda().requires_grad_()
    rewards =  schwefel(actor(z).tanh())
    min_index = rewards.argmin()
    if best_reward is None: best_reward = rewards[min_index]
    actor_loss = rewards.mean()
    actor_loss.backward()
    nn.utils.clip_grad_norm_(actor.parameters(), 1.0)
    opt.step()
    with torch.no_grad():
        if rewards[min_index] > best_reward: continue
        best_reward = rewards[min_index]
        print('epoch: %i loss: %f' % (epoch, best_reward.item()))

kayuksel commented 1 year ago

It is also good and applicable for combinatorial optimization, simply by sampling with the sigmoid of logits from the generator.

kayuksel commented 1 year ago

Hello again,

I've made a quick comparison on 30-dim Schwefel (as the hardest math function) against Nevergrad here. In my experiments, it easily scales to 100K+ dimensions in most other optimization benchmark functions. https://github.com/kayuksel/genmeta-vs-nevergrad

Sincerely, K

kayuksel commented 1 year ago

fyi, I have also applied it to MovieLens 1M matrix factorization problem (with 500K parameters), the codes are in the repository.

nnaisense / evotorch

GNN-based Meta-Learning for Sparse Portfolio Optimization #52