Input Convex Neural Network复现及仿真验证 | Oliver xu's Blog

olixu commented 3 years ago

https://blog.oliverxu.cn/2021/03/02/Input_convex_neural_network%E5%A4%8D%E7%8E%B0%E5%8F%8A%E5%AF%B9%E6%AF%94%E9%AA%8C%E8%AF%81/

《Input Convex Neural Networks》《Optimal Control Via Neural Networks: A Convex Approach》对于复杂系统的控制往往分为两步，对系统的辨识和控制器的设计。深度神经网络被证明在辨识任务中取得了重大成功，但是，由于这些辨识出来的系统往往是非线性和非凸的，其控制器很难设计，所以，实际系统往往还是用线性模型去逼近，尽管这些

mengyaShen commented 3 years ago

结论那里所有的gi应该是凸的非减函数吧？

olixu commented 3 years ago

结论那里所有的gi应该是凸的非减函数吧？

你说的应该是对的，“凸函数和凸的非减的函数还是凸函数”。

梦雅，现在还在搞相关科研吗？我之前问过ICNN的原作者了，他已经好几年没有继续研究这个了。可以加个微信交流交流。

mengyaShen commented 3 years ago

@olixu

结论那里所有的gi应该是凸的非减函数吧？

你说的应该是对的，“凸函数和凸的非减的函数还是凸函数”。

梦雅，现在还在搞相关科研吗？我之前问过ICNN的原作者了，他已经好几年没有继续研究这个了。可以加个微信交流交流。

噢噢好的，谢谢你～不好意思才看到邮件。我现在正在研究这个相关的课题，可以方便加一下微信嘛？我的微信是smy00022

RaoXuan-1998 commented 3 years ago

感谢博主的分享。国内有关 Input Convex Neural Network 的讨论很少，我也是在引用这篇文章的时候看到了您的分享。然后就顺手浏览了您往期的博客，感受到了博主宽泛的学术视野。有关Input Convex Neural Network，我也是在我本人是做深度神经架构搜索的，主要精力放在了深度学习上，但是我的课题组本身做的绝大部分都是Adapive Dynamic Programming，包括连续时间系统和离散时间系统，导师也恰好是您在《使用ADP设计线性系统最优控制器》列举的论文一作之一。实际上，我感觉基于迭代的Adapive Dynamic Programming更像是对动态规划的数值分析方法，跟很多model-free的强化学习有根本区别（RL通过试错学习)，其证明考虑的是状态空间的任意状态点，其收敛性证明可以从压缩映射定理出发，但也因为状态空间的连续性，实际仿真的时候只能在预定义区间内均匀采样有限个样本点，通过神经网络近似连续空间的值函数，这也如博主所说。我觉得控制确实需要向model-free或者model partial-unkown发展，因为实际的系统模型往往获取不了。但是完全modle-free的control算法难适用于safety-critical系统，如何在保证系统稳定的同时，让控制器与系统交互学习更优的控制策略是个难点。同时，Online-ADP算法如何在safety-critical系统发挥作用，也是个大难题，假如我的控制器初始化于一个容许控制律，但是怎么通过online采集到的数据点提升策略，如何保证提升后的策略是容许控制，也是问题。局部Local状态空间的最优值函数和全局Global状态空间的最优值函数之间是否有映射关系。控制领域确实需要一场大的革新，否则理论真的只会停留在理论阶段，相信博主的safe reinforcement learning 一定会在未来用场。

evanspuck commented 3 years ago

感谢博主分享，但有个小问题，论文中对input layer的参数范围好像没有进行限制，但您的文章中好像对input layer的参数也进行了非负的的限制，这部分有些不太明白

olixu commented 2 years ago

@evanspuck 感谢博主分享，但有个小问题，论文中对input layer的参数范围好像没有进行限制，但您的文章中好像对input layer的参数也进行了非负的的限制，这部分有些不太明白

您说的参数范围应该是ICNN代码中17行和51行附近的代码，其实代码写的有点脏，因为我用了两种方式去对网络的权重进行限制，第一种是直接在网络中加入激活函数，第二种是使用强制将权重改为正数的方式，但是我由于偷懒，直接将参数限制到（0-100）了，其实选一种就可以了。

至于输入层的参数进行了限制是必要的，ICNN整个前向通道都需要设置非负的限制。

DrHoisu commented 2 years ago

非常感谢博主的分享，请问第二个实验的内外圆二分类代码可以分享一下吗？萌新拜托您了

olixu commented 2 years ago

@DrHoisu 非常感谢博主的分享，请问第二个实验的内外圆二分类代码可以分享一下吗？萌新拜托您了

# Author: Oliver Xu
# Date: 2021.03.01 22:12:00
# 对<Input Convex Neural network>文章中的synthetic例子进行仿真

import sys
import random
import numpy as np 
from sklearn.datasets import make_moons, make_circles, make_classification

import torch
import torch.nn as nn 
import torch.nn.functional as F
import torch.utils.data as torch_data
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

class Data(torch_data.Dataset):
    def __init__(self, x, y):
        super(Data, self).__init__()
        self.x = torch.tensor(x, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.float32)

    def __len__(self):
        return self.x.shape[0]

    def __getitem__(self, idx):
        return  self.x[idx], self.y[idx]

class NonCvxModel(nn.Module):
    '''
        输入凸神经网络结构定义
    '''
    def __init__(self, input_dim=2, output_dim=1, n_feature=2, n_hidden=200, n_output=1, constraints=None, hidden_layer_sizes=[60, 45], dropout=0.3, activation='celu'):
        super(NonCvxModel, self).__init__()
        self.hidden_layer_sizes = hidden_layer_sizes
        self.droput = dropout
        self.activation = activation

        self.input_layer = nn.Linear(2, 200)
        self.hidden_layer1 = nn.Linear(200, 200)
        self.hidden_layer2 = nn.Linear(200, 200)
        self.output_layer = nn.Linear(200, 1)

    def forward(self, input):
        x = self.input_layer(input)
        x = F.relu(x)
        x = self.hidden_layer1(x)
        x = F.relu(x)
        x = self.hidden_layer2(x)
        x = F.relu(x)
        x = self.output_layer(x)
        return x

class CvxModel(nn.Module):
    '''
        输入凸神经网络结构定义
    '''
    def __init__(self, input_dim=2, output_dim=1, n_feature=2, n_hidden=200, n_output=1, constraints=None, hidden_layer_sizes=[60, 45], dropout=0.3, activation='celu'):
        super(CvxModel, self).__init__()
        self.hidden_layer_sizes = hidden_layer_sizes
        self.droput = dropout
        self.activation = activation

        self.quadratic_layers = nn.ModuleList([
            nn.Sequential(
                nn.Linear(input_dim, output_features, bias=True),
                nn.Dropout(dropout))
            for output_features in hidden_layer_sizes])

        sizes = zip(hidden_layer_sizes[:-1], hidden_layer_sizes[1:])
        self.convex_layers = nn.ModuleList([
            nn.Sequential(
                nn.Linear(input_features, output_features, bias=False),
                nn.Dropout(dropout))
            for (input_features, output_features) in sizes])

        self.final_layer = nn.Linear(hidden_layer_sizes[-1], output_dim, bias=False)

    def forward(self, input):
        output = self.quadratic_layers[0](input)
        for quadratic_layer, convex_layer in zip(self.quadratic_layers[1:], self.convex_layers):
            output = convex_layer(output) + quadratic_layer(input)
            if self.activation == 'celu':
                output = torch.relu(output)
        return self.final_layer(output)

    def convexify(self):
        for layer in self.convex_layers:
            for sublayer in layer:
                if (isinstance(sublayer, nn.Linear)):
                    sublayer.weight.data.clamp_(0)
        '''
        self.constraints = constraints
        self.input_layer = nn.Linear(n_feature, n_hidden, bias=False)
        self.hidden_layer = nn.Linear(n_hidden, n_hidden, bias=False)
        self.output_layer = nn.Linear(n_hidden, n_output, bias=False)
        self.passthrough_layer = nn.Linear(n_feature, n_hidden)
        self.passthrough_output_layer = nn.Linear(n_feature, n_output)

    def forward(self, x):
        # zx1输入是input维，输出是hidden维度
        # pass1输入是input维度，输出是hidden维度
        #print(x.dtype)
        pass1 = self.passthrough_layer(x)
        # pass2输入是input维度，输出是hidden维度
        pass2 = self.passthrough_layer(x) 
        # pass3输入是input维度，输出是output维度
        pass3 = self.passthrough_layer(x) 
        pass4 = self.passthrough_layer(x) 
        pass5 = self.passthrough_layer(x) 
        pass6 = self.passthrough_output_layer(x)
        zx1 = F.celu(self.input_layer(x))
        zx2 = F.celu(self.hidden_layer(zx1) + pass1)
        zx3 = F.celu(self.hidden_layer(zx2) + pass2)
        zx4 = F.celu(self.hidden_layer(zx3) + pass3)
        zx5 = F.celu(self.hidden_layer(zx4) + pass4)
        zx6 = F.celu(self.hidden_layer(zx5) + pass5)
        zx7 = self.output_layer(zx6) + pass6
        return zx7
        '''

        '''
        while(True):
            y_pred = self.forward(self.dataX)
            #loss = self.criterion(y_pred, self.dataY)/self.dataY.size()[0]
            loss = self.criterion(y_pred, self.dataY)
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()
            #print(self._modules)
            #sys.exit()
            self._modules['input_layer'].apply(self.constraints)
            self._modules['hidden_layer'].apply(self.constraints)
            self._modules['output_layer'].apply(self.constraints)
            self.i += 1
            if self.i % 500 == 0:
                print("当前正在训练第{}次， 损失是{}".format(self.i, loss.item()))
                print("预测值：", self.forward(self.dataX[10]))
                print("真实值：", self.dataY[10])
                #weight_print(self)
                print(50*'#')
                #test_model(model)
                #weight_print(model) # 输出神经网络的权重
            if loss.item()<terminate_threshold:
                print("训练满足精度要求，训练了{}次，损失是{}".format(self.i, loss.item()))
                break
        '''

def train(model, criterion, optimizer, terminate_threshold, dataX, dataY, data_loader, scheduler=None):
    for epoch in range(400):
        model.train()
        for x, y in data_loader:
            y_pred = model(x)
            loss = criterion(torch.sigmoid(y_pred), y.unsqueeze(1))
            #loss = criterion(y_pred, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            #model.convexify()
            #self._modules['input_layer'].apply(self.constraints)
            #self._modules['hidden_layer'].apply(self.constraints)
            #self._modules['output_layer'].apply(self.constraints)
        model.eval()
        scheduler.step()
        if epoch%20 == 0:
            print("当前正在训练第{}个epoch， 损失是{}".format(epoch, loss.item()))
            #print("预测值：", torch.sigmoid(model(x[10])))
            print("预测值：", model(x[10]))
            print("真实值：", y[10])
            #weight_print(self)
            print(50*'#')
            #test_model(model)
            #weight_print(model) # 输出神经网络的权重
'''
    for epoch in range(1, epochs+1):
        f1 = []
        ICNN.train()
        for X, y in data_loader:

            y_out = ICNN(X)
            loss = criterion(y.unsqueeze(1), torch.sigmoid(y_out))
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            ICNN.convexify()

        ICNN.eval()
        for X, y in data_loader:

            y_out = ICNN(X)
            test_loss = criterion(y.unsqueeze(1), torch.sigmoid(y_out))
            predY_bin = (torch.sigmoid(y_out).detach().numpy() >= 0.55).astype(np.int)
            trueY_bin = np.expand_dims(y.detach().numpy(), axis=1)
            f1.append(f1_score(trueY_bin.T, predY_bin.T, average='micro', pos_label=None))

        if scheduler is not None:
            scheduler.step()
        freq = max(epochs//20,50)
        if verbose and epoch%freq==0:
            print('Epoch {}/{} || Loss:  Train {:.4f} | Validation {:.4f}'.format(epoch, epochs, loss.item(), test_loss.item()))
            print('F1 score:', mean(f1))
'''

"""
# 输出神经网络每一层的参数
def weight_print(net):
    # net.modules 返回网络中所有module的一个迭代器

    print("神经网络每一层的结构：")
    for idx, m in enumerate(net.modules()):
        print(idx, '->', m)
    print(50*'#')
    print("神经网络所有nn.Linear层的参数：")
    for op in net.modules():
        if isinstance(op, nn.Linear):
            print(op)
            #print(dir(op))
            print(op.weight)
            print(op.bias)
            print(50*'*')

    print("隐藏层的权重为：", net._modules['passthrough_layer'].bias)

class weightConstraint():
    '''
        对一个特定的层的weight进行参数限制
    '''
    def __init__(self):
        pass

    def __call__(self, module):
        if hasattr(module, 'weight'):
            w = module.weight.data
            w = w.clamp(0.0, 1.0)
            module.weight.data = w

def setup_seed(seed):
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
"""

def setup_seed(seed):
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

def get_grid(data):
    x_min, x_max = data[:, 0].min() - 1, data[:, 0].max() + 1
    y_min, y_max = data[:, 1].min() - 1, data[:, 1].max() + 1
    print(x_min, x_max)
    print(y_min, y_max)
    return np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))

def main():
    dataX, dataY = make_circles(n_samples=600, noise=0.2, factor=0.5, random_state=42)
    dataY = 1. - dataY  # 原始数据靠近原点为1，而笛卡尔坐标系中，靠近原点应该为0
    batch_size_train = dataX.shape[0]//10
    dataset = Data(dataX, dataY)
    data_loader = torch_data.DataLoader(dataset, batch_size=batch_size_train, shuffle=True) 

    setup_seed(40)
    dataX = torch.tensor(dataX, dtype=torch.float32)
    dataY = torch.tensor(dataY, dtype=torch.float32).unsqueeze(1)

    #constraints = weightConstraint()
    #input_dim=2, output_dim=1, n_feature=2, n_hidden=200, n_output=1, constraints=None, hidden_layer_sizes=[60, 45], dropout=0.3, activation='celu'
    #model = CvxModel()
    model = NonCvxModel()
    #print(model)
    #criterion = torch.nn.MSELoss(reduction='sum')
    criterion = torch.nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.7)
    #optimizer = torch.optim.SGD(model.parameters(), lr=0.0001, weight_decay=1e-6, momentum=0.9, nesterov=True)
    terminate_threshold = 1e-5
    train(model, criterion, optimizer, terminate_threshold, dataX=dataX, dataY=dataY, data_loader=data_loader, scheduler=scheduler)
    # 画图
    xx, yy = get_grid(dataX)
    #print("xx:", xx)
    #print("xx.ravel():", xx.ravel())
    #print("yy.ravel():", yy.ravel())
    #print("np.c_[xx.ravel(), yy.ravel()]:", np.c_[xx.ravel(), yy.ravel()])
    #print()
    pred = model(Data(np.c_[xx.ravel(), yy.ravel()], np.c_[xx.ravel(), yy.ravel()])[:][0])
    zz = torch.sigmoid(pred).detach().numpy()
    print("zz: ", zz[0:5])
    #torch.set_printoptions(profile='full')
    #np.set_printoptions(threshold=np.inf)
    import scipy.io as io
    zz1 = np.array(zz)
    #np.savetxt('zz.txt', zz1)
    #print(zz)

    #fig, ax = plt.subplots(1, 1, figsize=(15,15))
    fig, ax = plt.subplots(1, 1, figsize=(10,10))
    #contour = ax.contourf(xx, yy, zz.reshape(xx.shape), alpha=0.5, cmap='autumn')
    #ax.contourf(xx, yy, zz.reshape(xx.shape), [0.2, 0.4, 0.6, 0.8])
    #ax.scatter(dataX[:,0], dataX[:,1], c=dataY, cmap = 'autumn_r',edgecolors='black', linewidth=1.5, s=50)
    #ax.scatter(dataX[:,0], dataX[:,1], c=dataY, cmap = 'autumn_r',edgecolors='black', linewidth=1.5, s=50)

    contour = plt.contourf(xx, yy, zz.reshape(xx.shape), levels=[0.150, 0.300, 0.450, 0.600, 0.750], alpha=0.5, cmap='autumn')
    #contour = plt.contourf(xx, yy, zz.reshape(xx.shape), 6, cmap=plt.cm.hot)
    #等高线上标明z（即高度）的值，字体大小是10，颜色分别是黑色和红色
    #ax.clabel(contour,fontsize=10,colors=('k','r','b', 'y'))
    #ax.clabel(contour,fontsize=10,colors=('r'))
    #ax.clabel(contour,fontsize=2,colors=('b'))

    ax.scatter(dataX[:,0], dataX[:,1], c=dataY, cmap = 'autumn_r',edgecolors='black', linewidth=1.5, s=50)

    print(dataX[:,0].min().item())
    print(type(dataX[:,0].min().item()))
    ax.set_xlim(round(dataX[:,0].min().item(), 1) - 0.5, round(dataX[:,0].max().item(), 1) + 0.5)
    ax.set_ylim(round(dataX[:,1].min().item(), 1) - 0.5, round(dataX[:,1].max().item(), 1) + 0.5)
    #ax.axis('off')
    ax.set_title('make_moons dataset', fontsize=25)
    fig.savefig('./icnn.pdf')
    '''
    # np.c_：按列相加，主要实现的功能就是将两个矩阵进行相加
    # ravel()：实现了矩阵的扁平化操作
    xx, yy = get_grid(x)
    pred = ICNN(Data(np.c_[xx.ravel(), yy.ravel()], np.c_[xx.ravel(), yy.ravel()])[:][0])
    print("xx[10]是：", xx[10])
    print("yy[10]是：", yy[10])
    print("pred[10]是：", pred[10])
    print("pred: ", pred[0:5])
    zz = 1 - torch.sigmoid(pred).detach().numpy()
    print("zz: ", zz[0:5])

    ax[i].contourf(xx, yy, zz.reshape(xx.shape), alpha=0.5, cmap='autumn')
    ax[i].scatter(x[:,0], x[:,1], c=y, cmap = 'autumn_r',edgecolors='black', linewidth=1.5, s=50)
    ax[i].set_xlim(round(x[:,0].min(), 1) - 0.5, round(x[:,0].max(), 1) + 0.5)
    ax[i].set_ylim(round(x[:,1].min(), 1) - 0.5, round(x[:,1].max(), 1) + 0.5)
    ax[i].axis('off')
    ax[i].set_title(names[i] + ' dataset', fontsize=25)
    '''

if __name__=='__main__':
    main()

fine4303 commented 1 year ago

博主你好，神经网络的权重在训练过程中应该根据梯度方向进行更新，如果每次训练完之后，都将权重裁剪为非负值，这样训练网络的效果不是很小吗，有点困惑了，谢谢答疑

olixu commented 1 year ago

博主你好，神经网络的权重在训练过程中应该根据梯度方向进行更新，如果每次训练完之后，都将权重裁剪为非负值，这样训练网络的效果不是很小吗，有点困惑了，谢谢答疑

是这样的，你说得非常对，训练完应该是一种欠拟合的状态。你可以读一下这个论文作者的博士论文，他的几篇文章合成了一篇博士论文，效果只能说差强人意，实际用起来不好用。

Shawnwn9 commented 1 year ago

博主你好，我最近也在看凸神经网络模型，看到你写的代码对您的代码能力很是崇拜。我看了看原文的凸神经网络模型代码，原文模型是在对神经网络的输出进行梯度下降后与真实值进行loss，我看您的代码好像是直接将神经网络的输出和真实值进行loss，这是不是会有点影响？还是我看漏了您的代码。另外，原文对神经网络输出进行梯度下降后计算loss，感觉可解释性好像更强烈一些，因为原文神经网络直接输出的是模型的y值，而不是y*。后来想了想，博主应该是只做了模型的拟合，推理部分还没有去做，是我把这俩理解到一块去了。不过有机会还是想和博主再交流交流。

1ancsy commented 2 months ago

博主你好，第二篇论文提到如果目标函数（损失函数）与约束函数都是凸函数时，可以用梯度下降法求得最优解（第7页第三段）。请问具体是怎么做的？

olixu / blog-comment

Input Convex Neural Network复现及仿真验证 | Oliver xu's Blog #75